Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.

该研究探讨了mBERT作为零-shot语言转移模型在跨语言任务上的运用，包括NLI、文档分类、NER、POS标注和依赖分析等五个任务。研究发现，mBERT在每个任务上都具有竞争力，并考察了其使用策略、语言无关特征和影响跨语言传输的因素。

Beto, Bentz, Becas：BERT的跨语言效果之惊人