BriefGPT.xyz
Oct, 2024
理解变换器长度泛化的正式框架
A Formal Framework for Understanding Length Generalization in Transformers
HTML
PDF
Xinting Huang, Andy Yang, Satwik Bhattamishra, Yash Sarrof, Andreas Krebs...
TL;DR
本研究解决了变换器在处理长于训练序列的输入时的泛化能力不足问题。我们提出了一个严格的理论框架,分析使用可学习绝对位置编码的因果变换器的长度泛化,证明了一类问题的长度泛化可能性。这一理论不仅解释了许多经验观察,还为预测变换器的长度泛化能力提供了可证明的方法。
Abstract
A major challenge for
Transformers
is generalizing to sequences longer than those observed during training. While previous works have empirically shown that
Transformers
can either succeed or fail at
→