BriefGPT.xyz
Sep, 2023
超越逆向KL:通过多样的散度约束泛化直接偏好优化
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
HTML
PDF
Chaoqi Wang, Yibo Jiang, Chenghao Yang, Han Liu, Yuxin Chen
TL;DR
在人类意见反馈上的强化学习和多样化的分歧约束下,使大语言模型(LLMs)能够更高效地与人类偏好相一致,从而改善对齐性能。
Abstract
The increasing capabilities of
large language models
(LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment.
→