Many existing imitation learning datasets are collected from multiple demonstrators, each with different expertise at different parts of the environment. Yet, standard imitation learning algorithms typically treat all demonstrators as homogeneous, regardless of their expertise, absorbing the weaknesses of any suboptimal demonstrators. In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms. We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators. This enables our model to learn from the optimal behavior and filter out the suboptimal behavior of each demonstrator. Our model learns a single policy that can outperform even the best demonstrator, and can be used to estimate the expertise of any demonstrator at any state. We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess, out-performing competing methods in $21$ out of $23$ settings, with an average of $7\%$ and up to $60\%$ improvement in terms of the final reward.

本研究通过对演示者专业技能的无监督学习，开发了一种可同时学习演示者政策和专业技能水平的联合模型，并通过过滤每种演示者的次优行为，训练出可以优于任何演示者的单一策略，并可用于估计任意状态下演示者的专业技能，在Robomimic等实际机器人控制任务以及MiniGrid和棋类等离散环境中取得了比其他方法更好的表现。

通过估计演示者的专业水平进行模仿学习