BriefGPT.xyz
Feb, 2023
基于状态扩展的人类偏好强化学习方法
A State Augmentation based approach to Reinforcement Learning from Human Preferences
HTML
PDF
Mudit Verma, Subbarao Kambhampati
TL;DR
本文提出了一种状态增强技术,利用二元反馈帮助人类进一步了解代理行为来学习奖励模型为强化学习提供更好的支持,并在三种任务领域 Mountain Car、Quadruped-Walk 和 Sweep-Into 中验证了其有效性。
Abstract
reinforcement learning
has suffered from poor reward specification, and issues for reward hacking even in simple enough domains. Preference Based
reinforcement learning
attempts to solve the issue by utilizing bi
→