Timo Kaufmann, Paul Weng, Viktor Bengs, Eyke Hüllermeier
TL;DR深入探讨人机交互技术中基于人类反馈的强化学习(RLHF)的基本原理、应用及其研究趋势。
Abstract
reinforcement learning from human feedback (rlhf) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the rel