The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior. However, because these analyses have focused on fully trained multilingual models, little is known about the dynamics of the multilingual pretraining process. We investigate when these models acquire their in-language and cross-lingual abilities by probing checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones. In contrast, when the model learns to transfer cross-lingually depends on the language pair. Interestingly, we also observe that, across many languages and tasks, the final, converged model checkpoint exhibits significant performance degradation and that no one checkpoint performs best on all languages. Taken together with our other findings, these insights highlight the complexity and interconnectedness of multilingual pretraining.

本研究旨在探究跨语言预训练模型的学习过程，发现该模型在语言内表现出较高的性能，复杂任务在低级语言技能前学习。添加不同的语言对跨语言转移的学习时机不同，并且最终模型层表现存在时间衰减现象，语言知识向网络底层传递。

多语言语言模型的单/跨语言预训练动态分析