In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias terms. We mathematically demonstrate the equivalence between a model with ICL demonstration prompts and the same model with the additional bias terms. Our algorithm (ICLCA) allows for exact conversion in an inexpensive manner. Existing methods are not exact and require expensive parameter updates. We demonstrate the efficacy of our approach through experiments that show the exact incorporation of ICL tokens into a linear transformer. We further suggest how our method can be adapted to achieve cheap approximate conversion of ICL tokens, even in regular transformer networks that are not linearized. Our experiments on GPT-2 show that, even though the conversion is only approximate, the model still gains valuable context from the included bias terms.

在这篇论文中，我们展示了一个算法（ICLCA），通过在线性变换网络中加入偏置项，可以使得上下文学习（ICL）得以明确和持久化。我们在数学上证明了通过ICL演示提示的模型与具有额外偏置项的同一模型之间的等价性。我们的方法允许以低成本进行精确转换，而现有方法并不精确且需要昂贵的参数更新。我们通过实验展示了我们方法的有效性，展示了将ICL令牌精确地纳入线性变换器中。我们进一步提出了如何适应我们的方法，以实现ICL令牌的便宜近似转换，即使在非线性化的常规变换网络中也可以实现。我们在GPT-2上的实验表明，即使转换只是近似的，模型仍然从包含的偏置项中获得了有价值的上下文。

上下文学习转模型权重的精确转换