Scaling multilingual representation learning beyond the hundred most frequent
languages is challenging, in particular to cover the long tail of low-resource
languages. A promising approach has been to train one-for-all multilingual
models capable of cross-lingual transfer, but these mo