BriefGPT.xyz
Feb, 2024
Transformer 模型可以实现长度概括,但不具备鲁棒性
Transformers Can Achieve Length Generalization But Not Robustly
HTML
PDF
Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal...
TL;DR
使用适当的数据格式和位置编码的组合,本研究首次展示了标准Transformers在能够外推到输入长度2.5倍的序列长度方面的成功,然而与内分布泛化不同,长度泛化仍然是脆弱的,受到随机权重初始化和训练数据顺序等因素的显著影响,导致不同随机种子之间存在较大差异。
Abstract
length generalization
, defined as the ability to extrapolate from shorter training sequences to longer test ones, is a significant challenge for
language models
. This issue persists even with large-scale
→