For many (minority) languages, the resources needed to train large models are not available. We investigate the performance of zero-shot transfer learning with as little data as possible, and the influence of language similarity in this process. We retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently fine-tuned on a POS-tagging task in the model's source language. By combining the new lexical layers and fine-tuned Transformer layers, we achieve high task performance for both target languages. With high language similarity, 10MB of data appears sufficient to achieve substantial monolingual transfer performance. Monolingual BERT-based models generally achieve higher downstream task performance after retraining the lexical layer than multilingual BERT, even when the target language is included in the multilingual model.

本篇论文重点研究了如何使用尽可能少的数据实现零-shot迁移学习，并探讨了语言相似度在该过程中的影响。研究人员利用两种低语言资源语言重新训练了四个基于BERT的模型的词汇层，同时对模型的源语言进行了独立的POS标记任务的微调。研究结果发现，通过将新的词汇层和微调后的Transformer层相结合，即使在仅有10MB的数据的情况下，也能显著提高两种目标语言任务的性能。值得注意的是，在目标语言被包含在多语言模型中时，单语BERT-based模型在重新训练词汇层后的下游任务表现要高于多语BERT。

在语言相似度高的情况下适应单语模型：数据匮乏的问题