BriefGPT.xyz
Feb, 2025
分布鲁棒直接偏好优化
Distributionally Robust Direct Preference Optimization
HTML
PDF
Zaiyan Xu, Sushil Vemuri, Kishan Panaganti, Dileep Kalathil, Rahul Jain...
TL;DR
本研究解决了大型语言模型与人类偏好对齐中的分布变化问题,提出了两种新颖的分布鲁棒直接偏好优化算法:Wasserstein DPO(WDPO)和Kullback-Leibler DPO(KLDPO)。我们的实验表明,这两种算法在解决偏好分布变化时,显著提高了对齐效果,具有重要的实际应用潜力。
Abstract
A major challenge in aligning large
Language Models
(LLMs) with human preferences is the issue of
Distribution Shift
. LLM alignment algorithms rely on static preference datasets, assuming that they accurately rep
→