Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and fine-tunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation. The code will be at https://aka.ms/xlm-t.

本文介绍了使用预训练的跨语言Transformer编码器初始化模型，并使用多语言平行数据微调的简单方法 XLM-T，它在10种语言对的WMT数据集和94种语言对的OPUS-100语料库中实现了显着的性能提升。此外，对XLM-T进行的无监督句法分析，词对齐和多语言分类的广泛分析说明了其对机器翻译的有效性。 

XLM-T: 使用预训练的跨语言 Transformer 编码器扩展多语言机器翻译能力