BriefGPT.xyz
Feb, 2024
RLHF中部分观测的奖励状态框架
A Framework for Partially Observed Reward-States in RLHF
HTML
PDF
Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari
TL;DR
通过模型化部分观察到的奖励状态对强化学习从人类反馈中进行建模,并通过减少基于人类反馈的两种主要形式(基数反馈和对战反馈)到部分观测到的奖励状态强化学习的归约,来提出了有效的统计算法。
Abstract
The study of
reinforcement learning
from
human feedback
(RLHF) has gained prominence in recent years due to its role in the development of LLMs. Neuroscience research shows that human responses to stimuli are kno
→