使用Teacher Forcing恢复文本生成的奖励函数

Oct, 2022

Teacher Forcing Recovers Reward Functions for Text Generation

Yongchang Hao, Yuxin Liu, Lili Mou

TL;DR我们提出了一种基于 teacher forcing 的无特定任务强化学习奖励函数生成方法，其稳定性高并优于自训练和奖励回归方法，可用于缓解曝光偏差或利用非平行数据集的文本生成任务。

Abstract

reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The