Deep reinforcement learning has shown an ability to achieve super-human
performance in solving complex reinforcement learning (RL) tasks only from
raw-pixels. However, it fails to reuse knowledge from previously learnt tasks
to solve new, unseen ones. Generalizing and reusing knowledge are the
fundamental requirements for creating a truly intelligent agent. This work
proposes a general method for one-to-one transfer learning based on generative
adversarial network model tailored to RL task.

该研究提出了一种基于生成对抗网络模型的一对一转移学习方法，旨在解决深度强化学习中新任务的知识重用和泛化问题。

强化学习任务状态对应关系的学习，用于知识迁移

Learning state correspondence of reinforcement learning tasks for knowledge transfer

We describe a simple scheme that allows an agent to learn about its
environment in an unsupervised manner. Our scheme pits two versions of the same
agent, Alice and Bob, against one another. Alice proposes a task for Bob to
complete; and then Bob attempts to complete the task. In this work we will
focus on two kinds of environments: (nearly) reversible environments and
environments that can be reset. Alice will "propose" the task by doing a
sequence of actions and then Bob must undo or repeat them, respectively. Via an
appropriate reward structure, Alice and Bob automatically generate a curriculum
of exploration, enabling unsupervised training of the agent. When Bob is
deployed on an RL task within the environment, this unsupervised training
reduces the number of supervised episodes needed to learn, and in some cases
converges to a higher reward.

通过提出动态协同（Alice 和 Bob）的兴趣课程，使用一种适当的奖励机制，有效地进行无人监督的强化学习，用于环境感知系统中的智能体的训练