BriefGPT.xyz
Oct, 2024
算术变压器可以在操作数长度和数量上实现长度泛化
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
HTML
PDF
Hanseul Cho, Jaeyoung Cha, Srinadh Bhojanapalli, Chulhee Yun
TL;DR
本研究解决了变压器在长度泛化方面的不足,特别是在多操作数加法和乘法任务中。通过设计特定任务的临时缓存和多层位置耦合的方法,我们首次在算术变压器上实现了约2-3倍的长度泛化。该工作可能在推进算术理解模型能力方面产生重要影响。
Abstract
Transformers
often struggle with
Length Generalization
, meaning they fail to generalize to sequences longer than those encountered during training. While
→