BriefGPT.xyz
Oct, 2023
预训练的Transformer是否真的通过梯度下降来学习上下文?
Do pretrained Transformers Really Learn In-context by Gradient Descent?
HTML
PDF
Lingfeng Shen, Aayush Mishra, Daniel Khashabi
TL;DR
在实际的自然语言环境中,对比了 In-Context Learning (ICL) 和 Gradient Descent (GD) 在语言模型上的表现差异,发现二者在适应语言模型的输出分布上存在不一致的行为。
Abstract
Is
in-context learning
(ICL) implicitly equivalent to
gradient descent
(GD)? Several recent works draw analogies between the dynamics of GD and the emergent behavior of ICL in large
→