Mutual information-based reinforcement learning (RL) has been proposed as a promising framework for retrieving complex skills autonomously without a task-oriented reward function through mutual information (MI) maximization or variational empowerment. However, learning complex skills is still challenging, due to the fact that the order of training skills can largely affect sample efficiency. Inspired by this, we recast variational empowerment as curriculum learning in goal-conditioned RL with an intrinsic reward function, which we name Variational Curriculum RL (VCRL). From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.

基于互信息的强化学习作为一种无需任务导向奖励函数的复杂技能自主检索方法已被提出，但由于训练技能的顺序会在很大程度上影响样本效率，对于学习复杂技能仍具有挑战性。本文提出了一种名为变分课程强化学习（VCRL）的方法，将变分增强视为内在奖励函数的目标条件强化学习中的课程学习，并基于信息理论提出了一种无监督技能发现的新方法，称为值不确定性变分课程（VUVC）。我们证明，在一定的正则条件下，与均匀课程相比，VUVC能够加快访问状态熵的增加。我们通过复杂导航和机器人操作任务验证了我们方法的有效性，同时以零次设定下的真实世界机器人导航任务为例，证明了通过我们方法发现的技能能够成功完成任务，并且将这些技能与全局规划器相结合可以进一步提高性能。

变分课程增强学习用于无监督技能发现