BriefGPT.xyz
May, 2024
混合偏好优化:用辅助目标增强直接偏好优化
Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives
HTML
PDF
Anirudhan Badrinath, Prabhat Agarwal, Jiajing Xu
TL;DR
大型语言模型的对齐问题是一个复杂的挑战,本文提出了混合偏好优化(HPO)方法,通过结合直接优化偏好和强化学习的方法实现了对用户偏好和辅助设计目标的有效泛化,同时在各种具有挑战性的基准和模型规模上保持了对齐性能。
Abstract
For aligning
large language models
(LLMs), prior work has leveraged
reinforcement learning
via human feedback (RLHF) or variations of
direct pref
→