BriefGPT.xyz
Apr, 2025
CHARM:使用聊天机器人竞技场分数校准奖励模型
CHARM: Calibrating Reward Models With Chatbot Arena Scores
HTML
PDF
Xiao Zhu, Chenmien Tan, Pinzhen Chen, Rico Sennrich, Yanlin Zhang...
TL;DR
本研究解决了奖励模型中的模型偏好偏差问题,该偏差导致对某些政策模型的响应评分不当。提出了一种名为CHARM的校准方法,利用聊天机器人竞技场的Elo分数来降低奖励模型的高估,从而提高评估准确性和与人类偏好的相关性,最终促进了更公平和可靠的奖励模型构建。
Abstract
Reward Models
(RMs) play a crucial role in
Reinforcement Learning
from Human Feedback by serving as proxies for human preferences in aligning large language models. In this paper, we identify a model
→