BriefGPT.xyz
Oct, 2023
转换器可以学习哪些算法?长度泛化研究
What Algorithms can Transformers Learn? A Study in Length Generalization
HTML
PDF
Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi...
TL;DR
大型语言模型的新兴泛化特性方面已经有了惊人的发现,但在诸多简单推理任务(如算术和奇偶性等)上仍存在问题。本研究针对算法任务的长度泛化范围,通过提出一个统一的框架,阐述了Transformer模型在特定任务上表现出的能力和方式。
Abstract
Large
language models
exhibit surprising emergent
generalization
properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer mod
→