从信用分配到熵正则化：神经序列预测的两个新算法

Apr, 2018

从信用分配到熵正则化：神经序列预测的两个新算法

From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

Zihang Dai, Qizhe Xie, Eduard Hovy

TL;DR本论文研究了奖励增强最大似然学习的信用分配问题，并在令牌级的 RAML 和熵正则化强化学习之间建立了理论等价性。在两个基准数据集上，我们展示了所提出的算法分别优于 RAML 和 Actor-Critic，为序列预测提供了新的选择。

Abstract

In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularize