Ideally, we would place a robot in a real-world environment and leave it
there improving on its own by gathering more experience autonomously. However,
algorithms for autonomous robotic learning have been challenging to realize in
the real world. While this has often been attributed to the challenge of sample
complexity, even sample-efficient techniques are hampered by two major
challenges - the difficulty of providing well "shaped" rewards, and the
difficulty of continual reset-free training. In this work, we describe a system
for real-world reinforcement learning that enables agents to show continual
improvement by training directly in the real world without requiring
painstaking effort to hand-design reward functions or reset mechanisms. Our
system leverages occasional non-expert human-in-the-loop feedback from remote
users to learn informative distance functions to guide exploration while
leveraging a simple self-supervised learning algorithm for goal-directed policy
learning. We show that in the absence of resets, it is particularly important
to account for the current "reachability" of the exploration policy when
deciding which regions of the space to explore. Based on this insight, we
instantiate a practical learning system - GEAR, which enables robots to simply
be placed in real-world environments and left to train autonomously without
interruption. The system streams robot experience to a web interface only
requiring occasional asynchronous feedback from remote, crowdsourced,
non-expert humans in the form of binary comparative feedback. We evaluate this
system on a suite of robotic tasks in simulation and demonstrate its
effectiveness at learning behaviors both in simulation and the real world.
Project website this https URL

实现自主学习的算法对于在真实环境中的机器人来说一直是个挑战，但本研究描述了一个实际的强化学习系统，通过在真实环境中进行训练，并借助人类的反馈来实现不间断的改进。该系统在不需要设计奖励函数或重置机制的情况下，通过自我监督学习算法和人类反馈产生的信息来指导探索和筛选学习策略。在模拟环境和真实世界中的机器人任务上的评估结果表明，该系统能够有效地学习行为。