BriefGPT.xyz
Oct, 2024
超越前1的视角:变换器按顺序确定顶级标记
Looking Beyond The Top-1: Transformers Determine Top Tokens In Order
HTML
PDF
Daria Lioubashevski, Tomer Schlank, Gabriel Stanovsky, Ariel Goldstein
TL;DR
本研究探讨了变换器中顶级标记预测固定后的计算过程,填补了对“饱和事件”的理解空白。我们提出了任务转移的机制,该机制解释了这些饱和事件的顺序发生,并为引入一种新的标记级早期退出策略奠定了基础,显著提升了性能与效率的平衡。
Abstract
Understanding the inner workings of
Transformers
is crucial for achieving more accurate and efficient predictions. In this work, we analyze the computation performed by
Transformers
in the layers after the top-1
→