BriefGPT.xyz
May, 2024
SPO:多维偏好顺序对齐与隐式奖励建模
SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
HTML
PDF
Xingzhou Lou, Junge Zhang, Jian Xie, Lifeng Liu, Dong Yan...
TL;DR
通过顺序优化方法,本研究提出了一种解决大规模语言模型对齐人类偏好多维度问题的方法,避免了显式奖励建模,并在人类偏好的多个维度上实现了对齐,实验证明其优于基线模型。
Abstract
human preference alignment
is critical in building powerful and reliable
large language models
(LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and ha
→