BriefGPT.xyz
Oct, 2023
奖励增强解码:高效受控文本生成的单向奖励模型
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
HTML
PDF
Haikang Deng, Colin Raffel
TL;DR
使用奖励增强解码(RAD)的文本生成过程,通过小型单向奖励模型来鼓励语言模型生成具有特定属性的文本,通过实验证明RAD在生成非有毒和情绪受控文本方面表现最佳,并且在减少计算开销方面与最先进的方法相媲美。
Abstract
While
large language models
have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce
reward-augmented decod
→