To increase autonomy in reinforcement learning, agents need to learn useful
behaviours without reliance on manually designed reward functions. To that end,
skill discovery methods have been used to learn the intrinsic options available
to an agent using task-agnostic objectives. However, without the guidance of
task-specific rewards, emergent behaviours are generally useless due to the
under-constrained problem of skill discovery in complex and high-dimensional
spaces. This paper proposes a framework for guiding the skill discovery towards
the subset of expert-visited states using a learned state projection. We apply
our method in various reinforcement learning (RL) tasks and show that such a
projection results in more useful behaviours.

本文提出了一种使用学习的状态投影来引导技能发现，从而使得强化学习的代理人在特定任务中获得了更有用的行为。

使用数据驱动指导学习任务无关技能

Learning Task Agnostic Skills with Data-driven Guidance

We present a method for combining multi-agent communication and traditional
data-driven approaches to natural language learning, with an end goal of
teaching agents to communicate with humans in natural language. Our starting
point is a language model that has been trained on generic, not task-specific
language data. We then place this model in a multi-agent self-play environment
that generates task-specific rewards used to adapt or modulate the model,
turning it into a task-conditional language model. We introduce a new way for
combining the two types of learning based on the idea of reranking language
model samples, and show that this method outperforms others in communicating
with humans in a visual referential communication task. Finally, we present a
taxonomy of different types of language drift that can occur alongside a set of
measures to detect them.

本文介绍一种结合多智能体通信和传统数据驱动方法的自然语言学习方法，通过在自身玩耍的环境中生成任务特定的奖励来适应或调节模型，形成任务条件化语言模型，并引入了一种基于语言模型样本重新排序的新方法，以优于其他方法地与人类进行视觉指称交流任务的通信。最后，我们提出了一种不同类型的语言漂移分类以及检测它们的措施。