Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose \textbf{Di}verse \textbf{Skil}l \textbf{L}earning (Di-SkilL), an RL method for learning diverse skills using Mixture of Experts, where each expert formalizes a skill as a contextual motion primitive. Di-SkilL optimizes each expert and its associate context distribution to a maximum entropy objective that incentivizes learning diverse skills in similar contexts. The per-expert context distribution enables automatic curricula learning, allowing each expert to focus on its best-performing sub-region of the context space. To overcome hard discontinuities and multi-modalities without any prior knowledge of the environment's unknown context probability space, we leverage energy-based models to represent the per-expert context distributions and demonstrate how we can efficiently train them using the standard policy gradient objective. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.

强化学习中的多样技能学习，使用混合专家方法和最大熵目标优化每个专家的上下文分布，以激励在相似情境中学习多样技能。利用基于能量的模型来表示每个专家的上下文分布，通过标准策略梯度目标有效地训练它们，进一步解决了环境未知上下文概率空间中的难以处理的不连续性和多模态问题，通过在挑战性的机器人模拟任务中展示，Di-SkilL可以学习出多样且高效的技能。

利用混合专家的课程强化学习获取多样化技能