大规模语言模型中的推理：几何视角

Jul, 2024

大规模语言模型中的推理：几何视角

Reasoning in Large Language Models: A Geometric Perspective

Romain Cosentino, Sarath Shekkizhar

TL;DR通过对大型语言模型（LLMs）的几何理解，我们探索了大型语言模型（LLMs）的推理能力，建立了LLMs的表达能力和自注意力图的密度之间的联系，通过理论分析和玩具示例证明了更高的内在维度意味着更大的LLM表达能力，并提供了将几何框架与增强LLM推理能力方法中的最新进展相联系的经验证据。

Abstract

The advancement of large language models (LLMs) for real-world applications hinges critically on enhancing their reasoning capabilities. In this work, we explore the reasoning abilities of →