BriefGPT.xyz
Jul, 2022
机器翻译中的鲁棒增强学习MAD
MAD for Robust Reinforcement Learning in Machine Translation
HTML
PDF
Domenic Donato, Lei Yu, Wang Ling, Chris Dyer
TL;DR
介绍了一种新的分布式策略梯度算法- MAD,并通过分布采样、条件奖励归一化和鲁棒重要性权重控制等方式实现了训练稳定性和泛化性能的提高,该算法在机器翻译模型优化任务中表现优异。
Abstract
We introduce a new
distributed policy gradient
algorithm and show that it outperforms existing
reward-aware training
procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO
→