BriefGPT.xyz
May, 2019
目标条件抽样:针对多语言神经机器翻译的优化数据选择
Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation
HTML
PDF
Xinyi Wang, Graham Neubig
TL;DR
本研究提出了一种名为目标条件采样(TCS)的有效算法,该算法基于构建一个覆盖所有多语数据的采样分布,从而最小化低资源语言的训练损失。实验结果表明,TCS可显著提高三种测试语言的 BLEU 值,最多可达 2,同时训练开销极小。
Abstract
To improve low-resource
neural machine translation
(NMT) with
multilingual corpora
, training on the most related high-resource language only is often more effective than using all data available (Neubig and Hu, 2
→