Despite recent progress in Reinforcement Learning for robotics applications,
many tasks remain prohibitively difficult to solve because of the expensive
interaction cost. Transfer learning helps reduce the training time in the
target domain by transferring knowledge learned in a source domain. Sim2Real
transfer helps transfer knowledge from a simulated robotic domain to a physical
target domain. Knowledge transfer reduces the time required to train a task in
the physical world, where the cost of interactions is high. However, most
existing approaches assume exact correspondence in the task structure and the
physical properties of the two domains. This work proposes a framework for
Few-Shot Policy Transfer between two domains through Observation Mapping and
Behavior Cloning. We use Generative Adversarial Networks (GANs) along with a
cycle-consistency loss to map the observations between the source and target
domains and later use this learned mapping to clone the successful source task
behavior policy to the target domain. We observe successful behavior policy
transfer with limited target task interactions and in cases where the source
and target task are semantically dissimilar.

通过观察映射和行为克隆，本文提出了一个针对两个领域的少样本策略传递框架，利用生成对抗网络（GANs）和循环一致性损失将源领域和目标领域之间的观察映射，并将获取的映射用于将成功的源任务行为策略克隆到目标领域，进而实现有限目标任务交互情况下和源领域与目标领域在语义上不相似的情况下的成功行为策略传递。

通过观测映射和行为克隆的少样本策略转移框架

A Framework for Few-Shot Policy Transfer through Observation Mapping and  Behavior Cloning

Domain adaptation in reinforcement learning (RL) mainly deals with the
changes of observation when transferring the policy to a new environment. Many
traditional approaches of domain adaptation in RL manage to learn a mapping
function between the source and target domain in explicit or implicit ways.
However, they typically require access to abundant data from the target domain.
Besides, they often rely on visual clues to learn the mapping function and may
fail when the source domain looks quite different from the target domain. To
address these problems, we propose a novel framework Online Prototype Alignment
(OPA) to learn the mapping function based on the functional similarity of
elements and is able to achieve the few-shot policy transfer within only
several episodes. The key insight of OPA is to introduce an exploration
mechanism that can interact with the unseen elements of the target domain in an
efficient and purposeful manner, and then connect them with the seen elements
in the source domain according to their functionalities (instead of visual
clues). Experimental results show that when the target domain looks visually
different from the source domain, OPA can achieve better transfer performance
even with much fewer samples from the target domain, outperforming prior
methods.

研究了强化学习领域的领域自适应问题，提出了一种基于功能相似性的在线原型对齐框架 (OPA)，该框架能够在少数几周期内实现策略转移，即使从目标域获得的样本数量很少，也能表现出更好的转移性能。