BriefGPT.xyz
Jun, 2021
利用单语数据不确定性的自训练采样在神经机器翻译中的应用
Self-Training Sampling with Monolingual Data Uncertainty for Neural Machine Translation
HTML
PDF
Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Shuming Shi, Michael R. Lyu...
TL;DR
本文提出了一种基于不确定性采样的自训练方法,通过选择最具信息价值的单一语言句子来补充平行数据,以提高NMT的性能,在大规模数据集上进行实验证明了这种方法的有效性,并表明该方法能够提高翻译质量和预测低频单词。
Abstract
self-training
has proven effective for improving
nmt
performance by augmenting model training with synthetic parallel data. The common practice is to construct synthetic data based on a randomly sampled subset of
→