BriefGPT.xyz
May, 2025
多令牌预测需要寄存器
Multi-Token Prediction Needs Registers
HTML
PDF
Anastasios Gerontopoulos, Spyros Gidaris, Nikos Komodakis
TL;DR
本研究解决了多令牌预测在语言模型预训练中的应用效果未能普遍推广到微调等其他场景的问题。我们提出的MuToR方法通过将可学习的寄存器令牌交错到输入序列中,旨在有效地进行未来目标的预测。研究表明,MuToR在多种应用场景中表现出色,尤其适用于有监督的微调任务,并且保持与传统下一令牌预训练目标的一致性。
Abstract
Multi-token Prediction
has emerged as a promising objective for improving language model
Pretraining
, but its benefits have not consistently generalized to other settings such as
→