BriefGPT.xyz
Jul, 2022
大型语言模型中的长度泛化探究
Exploring Length Generalization in Large Language Models
HTML
PDF
Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra...
TL;DR
该研究探讨了基于 transformer 的语言模型的长度推广能力,发现预训练大语言模型的上下文学习能力与记事本提示相结合能大大改善长度推广,并鉴别了错误的共同来源,为赋予语言模型推广到更长问题的能力提供了新的机会。
Abstract
The ability to extrapolate from short problem instances to longer ones is an important form of
out-of-distribution
generalization in
reasoning tasks
, and is crucial when learning from datasets where longer proble
→