Multilingual Neural Machine Translation (NMT) models have yielded large empirical success in transfer learning settings. However, these black-box representations are poorly understood, and their mode of transfer remains elusive. In this work, we attempt to understand massively multilingual NMT representations (with 103 languages) using Singular Value Canonical Correlation Analysis (SVCCA), a representation similarity framework that allows us to compare representations across different languages, layers and models. Our analysis validates several empirical results and long-standing intuitions, and unveils new observations regarding how representations evolve in a multilingual translation model. We draw three major conclusions from our analysis, with implications on cross-lingual transfer learning: (i) Encoder representations of different languages cluster based on linguistic similarity, (ii) Representations of a source language learned by the encoder are dependent on the target language, and vice-versa, and (iii) Representations of high resource and/or linguistically similar languages are more robust when fine-tuning on an arbitrary language pair, which is critical to determining how much cross-lingual transfer can be expected in a zero or few-shot setting. We further connect our findings with existing empirical observations in multilingual NMT and transfer learning.

本研究使用Singular Value Canonical Correlation Analysis（SVCCA）分析了包含103种语言的NMT模型，发现不同语言的编码器表示会基于语言相似性聚集，源语言和目标语言的表示相互依赖，并且高资源和/或语言相似性更强的语言在任意语言对上进行微调时更为稳健，这些结论对于跨语言转移学习非常重要，并进一步联系到现有的实证观察。

探究规模化多语言 NMT 表示