BriefGPT.xyz
Jun, 2023
算术变换器中的长度泛化
Length Generalization in Arithmetic Transformers
HTML
PDF
Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li...
TL;DR
本文研究transformers在基本整数算术和泛化到比训练中出现的更长序列的两个挑战中的应对方法,发现相对位置嵌入能够实现简单任务的长度泛化,但是对于乘法而言失败,提出了训练集引导方法(priming),为训练集添加一些长序列以解决此问题,并证明了该方法的有效性。同时,讨论了priming在算术以外的潜在应用。
Abstract
We examine how
transformers
cope with two challenges: learning basic integer
arithmetic
, and generalizing to longer sequences than seen during training. We find that relative
→