BriefGPT.xyz
Oct, 2023
基于贝叶斯方法对齐语言模型与人类偏好
Aligning Language Models with Human Preferences via a Bayesian Approach
HTML
PDF
Jiashuo Wang, Haozhao Wang, Shichao Sun, Wenjie Li
TL;DR
本文提出了一种新颖的方法,名为d-PM,采用贝叶斯框架来考虑人类偏好之间的分歧分布,并利用d-PM模型的偏好分数使用对比学习策略来训练自然语言生成模型,实验证明该方法在自动评估和人工评估方面一直优于之前的最佳模型。
Abstract
In the quest to advance human-centric
natural language generation
(NLG) systems, ensuring alignment between NLG models and
human preferences
is crucial. For this alignment, current popular methods leverage a
→