BriefGPT.xyz
Jul, 2020
InfoXLM: 跨语言语言模型预训练的信息理论框架
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
HTML
PDF
Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal...
TL;DR
该研究提出了一种信息论框架,将跨语言语言模型预训练作为最大化多语言-多粒度文本之间的相互信息来表述,以提高预训练模型的跨语言可迁移性,并提出了一种基于对比学习的预训练任务,实现了更好的预训练模型性能。
Abstract
In this work, we formulate
cross-lingual language model pre-training
as maximizing
mutual information
between
multilingual-multi-granularity text
→