Pretrained multilingual encoders enable zero-shot cross-lingual transfer, but often produce unreliable models that exhibit high performance variance on the target language. We postulate that this high variance results from zero-shot cross-lingual transfer solving an under-specified optimization problem. We show that any linear-interpolated model between the source language monolingual model and source + target bilingual model has equally low source language generalization error, yet the target language generalization error reduces smoothly and linearly as we move from the monolingual to bilingual model, suggesting that the model struggles to identify good solutions for both source and target languages using the source language alone. Additionally, we show that zero-shot solution lies in non-flat region of target language error generalization surface, causing the high variance.

通过研究，我们发现使用预训练的多语言编码器进行零样本跨语言转移时，可能会产生高方差的不可靠模型，这是由于零样本跨语言转移解决了欠约束的优化问题，而线性插值模型可以同时对源语言和目标语言进行较好地建模，因此可以作为更好的方案。此外，零样本解决方案位于目标语言错误泛化表面的非平坦区域，导致了高方差。

零样本跨语言转移是未规范化的优化