We propose a simple method to align multilingual contextual embeddings as a post-pretraining step for improved zero-shot cross-lingual transferability of the pretrained models. Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective as well as on the sentence level via contrastive learning and random input shuffling. We also perform code-switching with English when finetuning on downstream tasks. On XNLI, our best model (initialized from mBERT) improves over mBERT by 4.7% in the zero-shot setting and achieves comparable result to XLM for translate-train while using less than 18% of the same parallel data and 31% less model parameters. On MLQA, our model outperforms XLM-R_Base that has 57% more parameters than ours.

本研究提出了一种简单的方法，作为预训练后对多语种上下文嵌入进行对齐的步骤，以提高预训练模型的零-shot跨语言迁移能力。该方法通过最近提出的Translation Language Modeling目标在词级别上对嵌入进行对齐，并通过对比学习和随机输入洗牌在句子级别上进行对齐。在下游任务的微调中，使用英语进行句子级别的代码转换。在XNLI上，我们的最佳模型（从mBERT初始化）在零-shot设置上比mBERT提高了4.7％，在使用少于18％的相同平行数据和31％的模型参数的情况下，实现了与XLM for translate-train相当的结果。在MLQA上，我们的模型胜过比我们多57％参数的XLM-R_Base。

多语言BERT模型预训练后对齐