Imitation learning (IL) seeks to teach agents specific tasks through expert
demonstrations. One of the key approaches to IL is to define a distance between
agent and expert and to find an agent policy that minimizes that distance.
Optimal transport methods have been widely used in imitation learning as they
provide ways to measure meaningful distances between agent and expert
trajectories. However, the problem of how to optimally combine multiple expert
demonstrations has not been widely studied. The standard method is to simply
concatenate state (-action) trajectories, which is problematic when
trajectories are multi-modal. We propose an alternative method that uses a
multi-marginal optimal transport distance and enables the combination of
multiple and diverse state-trajectories in the OT sense, providing a more
sensible geometric average of the demonstrations. Our approach enables an agent
to learn from several experts, and its efficiency is analyzed on OpenAI Gym
control environments and demonstrates that the standard method is not always
optimal.

使用多边际最优传输距离的替代方法，实现了在 OT 意义下多个和多样化状态轨迹的组合，提供了更合理的演示几何平均值，从而使代理从多个专家中学习，并在 OpenAI Gym 控制环境中进行了效率分析，表明标准方法并不总是最优的。

关于通过最优转运在模仿学习中结合专家示范的研究

On Combining Expert Demonstrations in Imitation Learning via Optimal  Transport

We will consider all policies of the agent and will prove that one of them is
the best performing policy. While that policy is not computable, computable
policies do exist in its proximity. We will define AI as a computable policy
which is sufficiently proximal to the best performing policy. Before we can
define the agent's best performing policy, we need a language for description
of the world. We will also use this language to develop a program which
satisfies the AI definition. The program will first understand the world by
describing it in the selected language. The program will then use the
description in order to predict the future and select the best possible move.
While this program is extremely inefficient and practically unusable, it can be
improved by refining both the language for description of the world and the
algorithm used to predict the future. This can yield a program which is both
efficient and consistent with the AI definition.

通过定义可计算策略和描述世界的语言，发展了一个程序来预测未来并选择最佳行动，该程序可通过提高描述语言和算法的精度来提高效率并符合 AI 定义。