BriefGPT.xyz
May, 2019
无需基准状态的强化学习
Reinforcement Learning without Ground-Truth State
HTML
PDF
Xingyu Lin, Harjatin Singh Baweja, David Held
TL;DR
提出了一种简单的指示器奖励函数,以解决在连续状态空间中使用强化学习训练策略时无法基于高维观测指定奖励函数的挑战;并提出奖励平衡和奖励过滤两种方法,以进一步加速使用指示器奖励函数的模型的收敛速度,并展示了在无需知道地面实况的情况下从RGB-D图像中执行绳索操作等复杂任务的性能表现与使用地面实况的神谕方法的可比性。
Abstract
To perform
robot manipulation
tasks, a low dimension state of the environment typically needs to be estimated. However, designing a
state estimator
can sometimes be difficult, especially in environments with defo
→