BriefGPT.xyz
Jul, 2024
HAF-RM:一种用于奖励模型训练的混合对齐框架
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
HTML
PDF
Shujun Liu, Xiaoyu Shen, Yuhang Lai, Siyuan Wang, Shengbin Yue...
TL;DR
通过引入对令牌级别策略概率的额外约束来训练奖励模型的混合对齐框架(HaF-RM)能同时监督令牌级别的内部首选模型并优化奖励模型的映射层,通过解耦奖励建模过程并结合混合监督,我们的HaF-RM框架为增强奖励模型的性能和对齐提供了一种有原则和有效的方法。
Abstract
The
reward model
has become increasingly important in
alignment
, assessment, and
data construction
for large
→