Large language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models. Through detailed ablations, we discover that few-shot ICL performance depends primarily on FV heads, especially in larger models. In addition, we uncover that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism that ultimately drives ICL.

本研究针对大型语言模型在上下文学习（ICL）中的不同机制进行了探索，特别是归纳头与功能向量头之间的关系。研究表明，在少样本学习性能中，功能向量头起着主导作用，并且归纳头在训练阶段起到促进功能向量机制学习的作用。这一发现为理解语言模型的学习机制提供了新的视角。

哪些注意力头对于上下文学习至关重要？