The recent increase in data and model scale for language model pre-training has led to huge training costs. In scenarios where new data become available over time, updating a model instead of fully retraining it would therefore provide significant gains. In this paper, we study the benefits and downsides of updating a language model when new data comes from new languages - the case of continual learning under language shift. Starting from a monolingual English language model, we incrementally add data from Norwegian and Icelandic to investigate how forward and backward transfer effects depend on the pre-training order and characteristics of languages, for different model sizes and learning rate schedulers. Our results show that, while forward transfer is largely positive and independent of language order, backward transfer can be either positive or negative depending on the order and characteristics of new languages. To explain these patterns we explore several language similarity metrics and find that syntactic similarity appears to have the best correlation with our results.

对于语言模型的预训练，更新模型而不是完全重新训练，可以在新数据不断增加时提供显著的收益。本文研究了在语言转移的情况下，当新的数据来自新的语言时，更新语言模型的利与弊。通过将挪威语和冰岛语等数据逐步添加到单语英语语言模型中，我们研究了不同模型规模和学习率计划者的前向传递和后向传递的影响，并发现前向传递主要是积极的且与语言顺序无关，而后向传递则取决于新语言的顺序和特点而可能是积极的或消极的。为了解释这些模式，我们探索了几种语言相似性度量，并发现句法相似性与我们的结果有最好的相关性。

语言转变下的持续学习研究