Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained. However, despite the success of LLMs, there has been little understanding of how ICL learns the knowledge from the given prompts. In this paper, to make progress toward understanding the learning behaviour of ICL, we train the same LLMs with the same demonstration examples via ICL and supervised learning (SL), respectively, and investigate their performance under label perturbations (i.e., noisy labels and label imbalance) on a range of classification tasks. First, via extensive experiments, we find that gold labels have significant impacts on the downstream in-context performance, especially for large language models; however, imbalanced labels matter little to ICL across all model sizes. Second, when comparing with SL, we show empirically that ICL is less sensitive to label perturbations than SL, and ICL gradually attains comparable performance to SL as the model size increases.

大型语言模型（LLMs）在上下文学习（ICL）方面展示了显着的能力，在没有明确预训练的情况下，仅通过少量的训练示例学习新任务。然而，尽管LLMs获得了成功，对于ICL如何从给定的提示中学习知识却知之甚少。在本文中，为了对ICL的学习行为有所了解，我们通过ICL和监督学习(SL)分别使用相同的演示示例训练相同的LLMs，并研究它们在一系列分类任务中在标签扰动（即嘈杂标签和标签不平衡）下的表现。通过广泛的实验证明，我们首先发现黄金标签对下游上下文性能有显著影响，尤其是对于大型语言模型；然而，对于所有模型大小，不平衡标签对ICL的影响较小。其次，通过与SL进行比较，我们实证表明ICL对标签扰动的敏感性较低，并且随着模型大小的增加，ICL逐渐获得与SL相当的性能。

探究上下文学习行为：与监督学习的比较