低资源神经机器翻译的数据增强

May, 2017

Data Augmentation for Low-Resource Neural Machine Translation

Marzieh Fadaee, Arianna Bisazza, Christof Monz

TL;DR本研究提出了一种以数据增强为基础的方法，针对低频词汇在合成的新语境中生成新的句子对，以提高神经机器翻译系统的翻译质量。在模拟低资源环境中的实验结果显示，相对于基准和回译方法，我们的方法能够提高翻译质量，最高可提高2.9 BLEU分数。

Abstract

The quality of a neural machine translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor →