BriefGPT.xyz
May, 2022
基于偏好的强化学习中的探索奖励不确定性
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning
HTML
PDF
Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel
TL;DR
本文提出了基于学习奖励值的新颖探索方法来解决当前偏好型强化学习算法中人类反馈低效的问题,并在MetaWorld基准测试的复杂机器人操作任务中证明了其有效性。
Abstract
Conveying complex objectives to
reinforcement learning
(RL) agents often requires meticulous reward engineering.
preference-based rl
methods are able to learn a more flexible reward model based on human preferenc
→