BriefGPT.xyz
May, 2024
通过奖励模型精华提高偏好鲁棒性优化
Robust Preference Optimization through Reward Model Distillation
HTML
PDF
Adam Fisch, Jacob Eisenstein, Vicky Zayats, Alekh Agarwal, Ahmad Beirami...
TL;DR
通过预训练、直接偏好优化和蒸馏方法,改进了离线对齐过程中偏好数据分布转移的鲁棒性,同时保留了简单的监督学习性质。
Abstract
language model
(LM)
post-training
(or alignment) involves maximizing a reward function that is derived from
preference annotations
.
→