Large-scale generative language models such as GPT-3 are competitive few-shot
learners. While these models are known to be able to jointly represent many
different languages, their training data is dominated by English, potentially
limiting their cross-lingual generalization. In this w