TL;DR通过引入 Token Drop 以及两种自监督目标,提升神经机器翻译的泛化能力和避免过拟合,实验结果表明该方法在中英和英罗马尼亚基准数据集上表现显著优于强Transformer基线模型。
Abstract
neural machine translation with millions of parameters is vulnerable to unfamiliar inputs. We propose token drop to improve generalization and avoid overfitting for the NMT model. Similar to word dropout, whereas