This paper studies in-context learning (ICL) by decomposing the output of
large language models into the individual contributions of attention heads and
MLPs (components). We observe curious components: good-performing ones that
individually do well on a classification task, even when the model performs
poorly; bad-performing ones that do much worse than chance; and label-biased
components that always predict the same label. We find that component
accuracies are well-correlated across different demonstration sets and
perturbations of prompt templates, even when the full-model accuracy varies
greatly. Based on our findings, we propose component reweighting, which learns
to linearly re-scale the component activations from a few labeled examples.
Given 24 labeled examples, our method improves by an average of 6.0% accuracy
points over 24-shot ICL across 8 tasks on Llama-2-7B. Overall, this paper both
enriches our understanding of ICL and provides a practical method for
improvement by examining model internals.

通过将大型语言模型的输出分解为注意力头和 MLP（组件）的个体贡献，本文研究了上下文学习（ICL）。通过观察好表现的组件（即使整体模型表现不佳，它们在分类任务上也表现良好）、表现较差的组件（比随机预测差得多）和标签偏倚组件（总是预测相同的标签），我们发现组件准确性在不同的演示集和提示模板扰动下是相关的，即使整体模型的准确性差异很大。基于我们的发现，我们提出了组件重新加权的方法，该方法通过学习从少量标记示例中线性重缩放组件激活。在给定 24 个标记示例的情况下，我们的方法在 Llama-2-7B 上的 8 个任务上相对于 24-shot ICL 平均提高了 6.0％的准确性。总体而言，本文通过研究模型内部细节，丰富了我们对 ICL 的理解，并提供了一种实用的改进方法。