对比学习的多语言表征蒸馏

Oct, 2022

Multilingual Representation Distillation with Contrastive Learning

Weiting Tan, Kevin Heffernan, Holger Schwenk, Philipp Koehn

TL;DR该研究加入对比学习以蒸馏多语言表示，并用于平行语句的质量估计。实验证明，该方法在不同的资源稀少语言上显著优于先前的句子编码器，诸如LASER等。

Abstract

multilingual sentence representations from large models can encode semantic information from two or more languages and can be used for different cross-lingual information retrieval tasks. In this paper, we integr