BriefGPT.xyz
Oct, 2023
通过任务提示改进Transformer的长度泛化
Improving Length-Generalization in Transformers via Task Hinting
HTML
PDF
Pranjal Awasthi, Anupam Gupta
TL;DR
使用任务提示的方法改善长度泛化问题,在经典的排序问题上验证了其有效性,并通过探测和可视化技术提出了模型学习行为的理论构建,进一步提高了模型在未知长度上的性能。
Abstract
It has been observed in recent years that
transformers
have problems with
length generalization
for certain types of reasoning and arithmetic tasks. In particular, the performance of a transformer model trained o
→