Reinforcement learning provides a general framework for learning robotic
skills while minimizing engineering effort. However, most reinforcement
learning algorithms assume that a well-designed reward function is provided,
and learn a single behavior for that single reward function. Such reward
functions can be difficult to design in practice. Can we instead develop
efficient reinforcement learning methods that acquire diverse skills without
any reward function, and then repurpose these skills for downstream tasks? In
this paper, we demonstrate that a recently proposed unsupervised skill
discovery algorithm can be extended into an efficient off-policy method, making
it suitable for performing unsupervised reinforcement learning in the real
world. Firstly, we show that our proposed algorithm provides substantial
improvement in learning efficiency, making reward-free real-world training
feasible. Secondly, we move beyond the simulation environments and evaluate the
algorithm on real physical hardware. On quadrupeds, we observe that locomotion
skills with diverse gaits and different orientations emerge without any rewards
or demonstrations. We also demonstrate that the learned skills can be composed
using model predictive control for goal-oriented navigation, without any
additional training.

本文提出的无监督技能发现算法可用于进行高效无监督增强学习，通过模型预测控制将学习到的技能组合用于目标导航。