BriefGPT.xyz
Jul, 2023
离线强化学习在对话回复生成中的有效性
On the Effectiveness of Offline RL for Dialogue Response Generation
HTML
PDF
Paloma Sodhi, Felix Wu, Ethan R. Elenberg, Kilian Q. Weinberger, Ryan McDonald
TL;DR
研究通过离线强化学习方法在对话响应生成中最大化序列级目标,对多个数据集、模型和度量进行全面评估,离线强化学习相比于教师强制训练能够明显提高性能却不会导致训练不稳定或牺牲实际训练预算。
Abstract
A common training technique for
language models
is
teacher forcing
(TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of
→