BriefGPT.xyz
Jun, 2024
特权学生:关于多语言知识蒸馏中初始化价值的研究
The Privileged Students: On the Value of Initialization in Multilingual Knowledge Distillation
HTML
PDF
Haryo Akbarianto Wibowo, Thamar Solorio, Alham Fikri Aji
TL;DR
我们调查了知识蒸馏在多语言环境中的价值和模型初始化方法,发现通过将教师模型的权重直接复制到学生模型来增强初始化对于各种多语言环境中的模型初始化最为重要,并证明了高效的权重初始化在低资源场景下仍能保留多语言能力。
Abstract
knowledge distillation
(KD) has proven to be a successful strategy to improve the performance of a smaller model in many
nlp
tasks. However, most of the work in KD only explores monolingual scenarios. In this pap
→