Reinforcement learning (RL) is a powerful approach for acquiring a
good-performing policy. However, learning diverse skills is challenging in RL
due to the commonly used Gaussian policy parameterization. We propose
\textbf{Di}verse \textbf{Skil}l \textbf{L}earning (Di-SkilL), an RL method for
learning diverse skills using Mixture of Experts, where each expert formalizes
a skill as a contextual motion primitive. Di-SkilL optimizes each expert and
its associate context distribution to a maximum entropy objective that
incentivizes learning diverse skills in similar contexts. The per-expert
context distribution enables automatic curricula learning, allowing each expert
to focus on its best-performing sub-region of the context space. To overcome
hard discontinuities and multi-modalities without any prior knowledge of the
environment's unknown context probability space, we leverage energy-based
models to represent the per-expert context distributions and demonstrate how we
can efficiently train them using the standard policy gradient objective. We
show on challenging robot simulation tasks that Di-SkilL can learn diverse and
performant skills.

强化学习中的多样技能学习，使用混合专家方法和最大熵目标优化每个专家的上下文分布，以激励在相似情境中学习多样技能。利用基于能量的模型来表示每个专家的上下文分布，通过标准策略梯度目标有效地训练它们，进一步解决了环境未知上下文概率空间中的难以处理的不连续性和多模态问题，通过在挑战性的机器人模拟任务中展示，Di-SkilL 可以学习出多样且高效的技能。

利用混合专家的课程强化学习获取多样化技能

Acquiring Diverse Skills using Curriculum Reinforcement Learning with  Mixture of Experts

Humans learn to master open-ended repertoires of skills by imagining and
practicing their own goals. This autotelic learning process, literally the
pursuit of self-generated (auto) goals (telos), becomes more and more
open-ended as the goals become more diverse, abstract and creative. The
resulting exploration of the space of possible skills is supported by an
inter-individual exploration: goal representations are culturally evolved and
transmitted across individuals, in particular using language. Current
artificial agents mostly rely on predefined goal representations corresponding
to goal spaces that are either bounded (e.g. list of instructions), or
unbounded (e.g. the space of possible visual inputs) but are rarely endowed
with the ability to reshape their goal representations, to form new
abstractions or to imagine creative goals. In this paper, we introduce a
language model augmented autotelic agent (LMA3) that leverages a pretrained
language model (LM) to support the representation, generation and learning of
diverse, abstract, human-relevant goals. The LM is used as an imperfect model
of human cultural transmission; an attempt to capture aspects of humans'
common-sense, intuitive physics and overall interests. Specifically, it
supports three key components of the autotelic architecture: 1)~a relabeler
that describes the goals achieved in the agent's trajectories, 2)~a goal
generator that suggests new high-level goals along with their decomposition
into subgoals the agent already masters, and 3)~reward functions for each of
these goals. Without relying on any hand-coded goal representations, reward
functions or curriculum, we show that LMA3 agents learn to master a large
diversity of skills in a task-agnostic text-based environment.

本研究介绍了一种使用预训练语言模型（LM）的语言模型增强的自我目标学习环境，它支持自动生成并学习具有多样性、抽象性、与人类相关的目标 —— 而非手动编码的目标表示、回报函数或课程，该系统可以在基于文本的任务无关环境中学习掌握各种广泛的技能。