BriefGPT.xyz
May, 2023
SLiC-HF: 序列似然校准与人类反馈
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
HTML
PDF
Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh...
TL;DR
本文介绍了如何使用Sequence Likelihood Calibration(SLiC)从人类反馈中有效地学习,并证明了这种方法在人类评估实验中可以极大地提高监督微调基线和PPO RLHF的竞争力。同时,与过去的工作相比,使用SLiC-HF实现简单、易于调节且具有更高的计算效率。
Abstract
Learning from
human feedback
has been shown to be effective at aligning
language models
with human preferences. Past work has often relied on Reinforcement Learning from
→