Julian Eisenschlos, Sebastian Ruder, Piotr Czapla, Marcin Kardas, Sylvain Gugger...
TL;DR本研究提出了 Multi-lingual language model Fine-Tuning (MultiFiT) 方法,使得从未标注的数据中有效地训练和优化预训练语言模型,特别对于低资源性语言,实现了零样本跨语言模型,并在两个跨语言分类数据集中优于那些使用更多数据和计算的预训练模型。
Abstract
pretrained language models are promising particularly for low-resource languages as they only require unlabelled data. However, training existing models requires huge amounts of compute, while pretrained