As the capabilities of language models continue to advance, it is conceivable
that "one-size-fits-all" model will remain as the main paradigm. For instance,
given the vast number of languages worldwide, many of which are low-resource,
the prevalent practice is to pretrain a single mode