BriefGPT.xyz
Jan, 2025
AlphaPO -- 奖励形状对大型语言模型对齐的重要性
AlphaPO -- Reward shape matters for LLM alignment
HTML
PDF
Aman Gupta, Shao Tang, Qingquan Song, Sirou Zhu, Jiwoo Hong...
TL;DR
本文针对大型语言模型对齐中的直接对齐算法(DAA)面临的奖励模型缺失问题,提出了一种新的方法AlphaPO,强调奖励函数的形状对模型性能的影响。通过引入$\alpha$参数,AlphaPO在保持对可能性位移和过度优化的精细控制方面表现出色,使得在对齐性能上相较于SimpO改善了约7%至10%。
Abstract
Reinforcement Learning
with
Human Feedback
(RLHF) and its variants have made huge strides toward the effective alignment of
Large Language Models
→