In recent years, the development of pre-trained language models (PLMs) has gained momentum, showcasing their capacity to transcend linguistic barriers and facilitate knowledge transfer across diverse languages. However, this progress has predominantly bypassed the inclusion of very-low resource languages, creating a notable void in the multilingual landscape. This paper addresses this gap by introducing four tailored PLMs specifically finetuned for Angolan languages, employing a Multilingual Adaptive Fine-tuning (MAFT) approach. In this paper, we survey the role of informed embedding initialization and synthetic data in enhancing the performance of MAFT models in downstream tasks. We improve baseline over SOTA AfroXLMR-base (developed through MAFT) and OFA (an effective embedding initialization) by 12.3 and 3.8 points respectively.

通过引入四个针对安哥拉语言进行微调的针对预先训练语言模型（PLM）进行定制的PLM，采用多语言自适应微调（MAFT）方法，本文调查了在下游任务中通过信息嵌入初始化和合成数据来增强MAFT模型性能的作用，将基线模型在SOTA AfroXLMR-base（通过MAFT开发的）和OFA（有效的嵌入初始化）上分别提高了12.3和3.8个百分点。

ANGOFA：利用OFA嵌入初始化和合成数据的安哥拉语言模型