Cross-domain imitation learning studies how to leverage expert demonstrations
of one agent to train an imitation agent with a different embodiment or
morphology. Comparing trajectories and stationary distributions between the
expert and imitation agents is challenging because they live on different
systems that may not even have the same dimensionality. We propose
Gromov-Wasserstein Imitation Learning (GWIL), a method for cross-domain
imitation that uses the Gromov-Wasserstein distance to align and compare states
between the different spaces of the agents. Our theory formally characterizes
the scenarios where GWIL preserves optimality, revealing its possibilities and
limitations. We demonstrate the effectiveness of GWIL in non-trivial continuous
control domains ranging from simple rigid transformation of the expert domain
to arbitrary transformation of the state-action space.

本研究提出了一种基于 Gromov-Wasserstein 距离的 “GWIL” 方法，用于跨领域模仿学习，其理论模型明确了 GWIL 保持最优性的场景和实现方式，实验结果表明 GWIL 对于连续控制域中各种变换的效果良好。

跨领域最优输运模仿学习

Cross-Domain Imitation Learning via Optimal Transport

Deep learning and reinforcement learning methods have recently been used to
solve a variety of problems in continuous control domains. An obvious
application of these techniques is dexterous manipulation tasks in robotics
which are difficult to solve using traditional control theory or
hand-engineered approaches. One example of such a task is to grasp an object
and precisely stack it on another. Solving this difficult and practically
relevant problem in the real world is an important long-term goal for the field
of robotics. Here we take a step towards this goal by examining the problem in
simulation and providing models and techniques aimed at solving it. We
introduce two extensions to the Deep Deterministic Policy Gradient algorithm
(DDPG), a model-free Q-learning based method, which make it significantly more
data-efficient and scalable. Our results show that by making extensive use of
off-policy data and replay, it is possible to find control policies that
robustly grasp objects and stack them. Further, our results hint that it may
soon be feasible to train successful stacking policies by collecting
interactions on real robots.

本研究使用深度学习和强化学习方法解决机器人的熟练操作任务，同样使用了 DDPG 算法来扩展其功能以实现更高效的数据利用与可伸缩性，成功地使用现实世界的抓取和叠放机器人的交互数据训练出其掌握复杂熟练操作技能的有效策略模型。