Developing agents that can execute multiple skills by learning from
pre-collected datasets is an important problem in robotics, where online
interaction with the environment is extremely time-consuming. Moreover,
manually designing reward functions for every single desired skill is
prohibitive. Prior works targeted these challenges by learning goal-conditioned
policies from offline datasets without manually specified rewards, through
hindsight relabelling. These methods suffer from the issue of sparsity of
rewards, and fail at long-horizon tasks. In this work, we propose a novel
self-supervised learning phase on the pre-collected dataset to understand the
structure and the dynamics of the model, and shape a dense reward function for
learning policies offline. We evaluate our method on three continuous control
tasks, and show that our model significantly outperforms existing approaches,
especially on tasks that involve long-term planning.

在机器人领域，通过从离线数据集中学习实现多项技能的智能体是一个重要问题。本文提出了在自监督学习阶段对预先收集的数据集进行处理以理解模型的结构和动态，并对离线学习的策略进行强化学习的方法。我们在三个连续控制任务上评估了我们的方法，并展示了我们的模型在特别是涉及长期规划任务上明显优于现有方法。