BriefGPT.xyz
Feb, 2024
噪声对比对齐语言模型与显性奖励
Noise Contrastive Alignment of Language Models with Explicit Rewards
HTML
PDF
Huayu Chen, Guande He, Hang Su, Jun Zhu
TL;DR
使用噪声对比评估(NCE)方法来处理明确注释的奖励数据,比起直接偏好优化(DPO)方法在语言模型(LM)对齐方面具有更好的性能和稳定性。
Abstract
user intentions
are typically formalized as evaluation rewards to be maximized when
fine-tuning
language models
(LMs). Existing
→