BriefGPT.xyz
Apr, 2023
RRHF:无需痛苦排名回应,将语言模型与人类反馈对齐
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
HTML
PDF
Zheng Yuan, Hongyi Yuan, Chuanqi Tan, Wei Wang, Songfang Huang...
TL;DR
RRHF是一种新的学习范式,通过排名损失函数对生成的回答进行评分,从而能够有效地将语言模型输出与人类偏好对齐,而且只需要1到2个模型进行调整,效果与微调相当。
Abstract
reinforcement learning
from
human feedback
(RLHF) facilitates the alignment of large
language models
with human preferences, significantly
→