While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired attribute. In this paper, we introduce Reward-Augmented Decoding (RAD), a text generation procedure that uses a small unidirectional reward model to encourage a language model to generate text that has certain properties. Specifically, RAD uses the reward model to score generations as they are produced and rescales sampling probabilities to favor high-reward tokens. By using a unidirectional reward model, RAD can cache activations from prior generation steps to decrease computational overhead. Through experiments on generating non-toxic and sentiment-controlled text, we demonstrate that RAD performs best among methods that change only the generation procedure and matches the performance of state-of-the-art methods that involve re-training the language model. We further validate that RAD is effective on very large language models while incurring a minimal computational overhead.

使用奖励增强解码（RAD）的文本生成过程，通过小型单向奖励模型来鼓励语言模型生成具有特定属性的文本，通过实验证明RAD在生成非有毒和情绪受控文本方面表现最佳，并且在减少计算开销方面与最先进的方法相媲美。

奖励增强解码：高效受控文本生成的单向奖励模型