In this work, we formulate cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, the information-theoretic framework inspires us to propose a pre-training task based on contrastive learning. Given a bilingual sentence pair, we regard them as two views of the same meaning, and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at http://aka.ms/infoxlm.

该研究提出了一种信息论框架，将跨语言语言模型预训练作为最大化多语言-多粒度文本之间的相互信息来表述，以提高预训练模型的跨语言可迁移性，并提出了一种基于对比学习的预训练任务，实现了更好的预训练模型性能。

InfoXLM: 跨语言语言模型预训练的信息理论框架