BriefGPT.xyz
Feb, 2025
哪些注意力头对于上下文学习至关重要?
Which Attention Heads Matter for In-Context Learning?
HTML
PDF
Kayo Yin, Jacob Steinhardt
TL;DR
本研究针对大型语言模型在上下文学习(ICL)中的不同机制进行了探索,特别是归纳头与功能向量头之间的关系。研究表明,在少样本学习性能中,功能向量头起着主导作用,并且归纳头在训练阶段起到促进功能向量机制学习的作用。这一发现为理解语言模型的学习机制提供了新的视角。
Abstract
Large
Language Models
(LLMs) exhibit impressive
In-Context Learning
(ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed
→