Deep Reinforcement Learning has been successfully applied to learn robotic
control. However, the corresponding algorithms struggle when applied to
problems where the agent is only rewarded after achieving a complex task. In
this context, using demonstrations can significantly speed up the learning
process, but demonstrations can be costly to acquire. In this paper, we propose
to leverage a sequential bias to learn control policies for complex robotic
tasks using a single demonstration. To do so, our method learns a
goal-conditioned policy to control a system between successive low-dimensional
goals. This sequential goal-reaching approach raises a problem of compatibility
between successive goals: we need to ensure that the state resulting from
reaching a goal is compatible with the achievement of the following goals. To
tackle this problem, we present a new algorithm called DCIL-II. We show that
DCIL-II can solve with unprecedented sample efficiency some challenging
simulated tasks such as humanoid locomotion and stand-up as well as fast
running with a simulated Cassie robot. Our method leveraging sequentiality is a
step towards the resolution of complex robotic tasks under minimal
specification effort, a key feature for the next generation of autonomous
robots.

该研究使用深度强化学习通过单个演示来学习控制复杂机器人任务的目标条件策略，并提出 DCIL-II 算法以解决连续目标之间的兼容性问题，并在仿真环境中展示了前所未有的样本效率。