理解Transformer在序列建模中的表达能力和机制

Feb, 2024

理解Transformer在序列建模中的表达能力和机制

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

Mingze Wang, Weinan E

TL;DR我们对Transformer在序列建模中长、稀疏和复杂内存的逼近性质进行了系统研究，调查了Transformer的不同组件（如点积自注意力、位置编码和前馈层）对其表达能力的影响机制，并通过建立显式的逼近率来研究它们的综合效应。我们的研究揭示了Transformer中关键参数（如层数和注意力头数）的作用，并为替代架构提供了自然建议。

Abstract

We conduct a systematic study of the approximation properties of transformer for sequence modeling with long, sparse and complicated memor