BriefGPT.xyz
Aug, 2024
下一分布预测作为更广泛的目标
NDP: Next Distribution Prediction as a More Broad Target
HTML
PDF
Junhao Ruan, Abudukeyumu Abudula, Xinyu Liu, Bei Li, Yinqiao Li...
TL;DR
本研究针对现有的下一令牌预测(NTP)范式的局限性,特别是在任务复杂性和推理时错误传播方面,提出了批评。通过引入下一分布预测(NDP)方法,将$n$-gram分布替代一热目标,显示出在翻译、一般任务以及医疗领域适应性上显著提高了性能,这为改善NTP提供了新的研究方向。
Abstract
large language models
(LLMs) trained on next-token prediction (NTP) paradigm have demonstrated powerful capabilities. However, the existing NTP paradigm contains several limitations, particularly related to planned task complications and
→