Self-supervised large language models have demonstrated the ability to perform Machine Translation (MT) via in-context learning, but little is known about where the model performs the task with respect to prompt instructions and demonstration examples. In this work, we attempt to characterize the region where large language models transition from in-context learners to translation models. Through a series of layer-wise context-masking experiments on \textsc{GPTNeo2.7B}, \textsc{Bloom3B}, \textsc{Llama7b} and \textsc{Llama7b-chat}, we demonstrate evidence of a "task recognition" point where the translation task is encoded into the input representations and attention to context is no longer necessary. We further observe correspondence between the low performance when masking out entire layers, and the task recognition layers. Taking advantage of this redundancy results in 45\% computational savings when prompting with 5 examples, and task recognition achieved at layer 14 / 32. Our layer-wise fine-tuning experiments indicate that the most effective layers for MT fine-tuning are the layers critical to task recognition.

通过层次上的上下文遮蔽实验，我们证明了大型语言模型中存在一个任务识别点，该点将任务编码到输入表示中，不再需要注意上下文，同时还观察到在层次遮蔽时的低性能与任务识别层之间的对应关系，利用这种冗余性可在提示5个示例时节省45%的计算量，任务识别在第14/32层达到，并且层次微调实验表明对于MT微调来说，最有效的层次是关键的任务识别层。

大型语言模型中的上下文翻译发生在哪里