BriefGPT.xyz
Jun, 2023
RLHF是否比标准RL更困难?
Is RLHF More Difficult than Standard RL?
HTML
PDF
Yuanhao Wang, Qinghua Liu, Chi Jin
TL;DR
这篇文章证明了,对于广泛的偏好模型,我们可以使用现有的算法和技术,直接解决基于偏好的强化学习问题,具有小的或没有额外成本。
Abstract
reinforcement learning
from
human feedback
(RLHF) learns from preference signals, while standard
reinforcement learning
(RL) directly lear
→