The emergence of in-context learning (ICL) is potentially attributed to two
major abilities: task recognition (TR) for recognizing the task from
demonstrations and utilizing pre-trained priors, and task learning (TL) for
learning from demonstrations. However, relationships between the two abilities
and how such relationships affect the emergence of ICL is unclear. In this
paper, we take the first step by examining the pre-training dynamics of the
emergence of ICL. With carefully designed metrics, we find that these two
abilities are, in fact, competitive during pre-training. Moreover, we observe a
strong negative correlation between the competition and ICL performance.
Further analysis of common pre-training factors (i.e., model size, dataset
size, and data curriculum) demonstrates possible ways to manage the
competition. Based on these insights, we propose a simple yet effective method
to better integrate these two abilities for ICL at inference time. Through
adaptive ensemble learning, the performance of ICL can be significantly
boosted, enabling two small models to outperform a larger one with more than
twice the parameters. The code is available at
this https URL

利用预先训练的先验知识，通过任务识别和任务学习两种能力的竞争来促进上下文学习的出现，并提出了一种简单而有效的方法，在推理时间内更好地整合这两种能力。通过自适应集成学习，可以显著提高上下文学习的性能，使两个小模型的性能超过具有两倍参数的大模型。

探究上下文学习的预训练动态：任务识别与任务学习

Investigating the Pre-Training Dynamics of In-Context Learning: Task  Recognition vs. Task Learning

Multilingual Large Language Models (LLMs) achieve remarkable levels of
zero-shot cross-lingual transfer performance. We speculate that this is
predicated on their ability to align languages without explicit supervision
from parallel sentences. While representations of translationally equivalent
sentences in different languages are known to be similar after convergence,
however, it remains unclear how such cross-lingual alignment emerges during
pre-training of LLMs. Our study leverages intrinsic probing techniques, which
identify which subsets of neurons encode linguistic features, to correlate the
degree of cross-lingual neuron overlap with the zero-shot cross-lingual
transfer performance for a given model. In particular, we rely on checkpoints
of BLOOM, a multilingual autoregressive LLM, across different training steps
and model scales. We observe a high correlation between neuron overlap and
downstream performance, which supports our hypothesis on the conditions leading
to effective cross-lingual transfer. Interestingly, we also detect a
degradation of both implicit alignment and multilingual abilities in certain
phases of the pre-training process, providing new insights into the
multilingual pretraining dynamics.

多语言大型语言模型通过隐式对齐语言和神经元重叠达到零 - shot 跨语言转移性能，本研究使用内在探测技术通过检查点观察到神经元重叠和下游性能之间的高相关性，同时探测到预训练过程中隐式对齐和多语言能力的退化现象，为多语言预训练动态提供了新的见解。