BriefGPT.xyz
Oct, 2022
使用Teacher Forcing恢复文本生成的奖励函数
Teacher Forcing Recovers Reward Functions for Text Generation
HTML
PDF
Yongchang Hao, Yuxin Liu, Lili Mou
TL;DR
我们提出了一种基于 teacher forcing 的无特定任务强化学习奖励函数生成方法,其稳定性高并优于自训练和奖励回归方法,可用于缓解曝光偏差或利用非平行数据集的文本生成任务。
Abstract
reinforcement learning
(RL) has been widely used in
text generation
to alleviate the exposure bias issue or to utilize non-parallel datasets. The
→