BriefGPT.xyz
Apr, 2024
通过多令牌预测实现更好、更快的大型语言模型
Better & Faster Large Language Models via Multi-token Prediction
HTML
PDF
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve
TL;DR
我们建议通过训练语言模型来预测多个未来标记,以提高样本利用效率,并对其下游能力进行改进,特别是在多词预测作为辅助训练任务时,在代码和自然语言生成模型方面获得了显著的改善。
Abstract
Large
language models
such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training
language models
to predict multiple future tokens at once results in higher
→