In Imitation Learning (IL), utilizing suboptimal and heterogeneous
demonstrations presents a substantial challenge due to the varied nature of
real-world data. However, standard IL algorithms consider these datasets as
homogeneous, thereby inheriting the deficiencies of suboptimal demonstrators.
Previous approaches to this issue typically rely on impractical assumptions
like high-quality data subsets, confidence rankings, or explicit environmental
knowledge. This paper introduces IRLEED, Inverse Reinforcement Learning by
Estimating Expertise of Demonstrators, a novel framework that overcomes these
hurdles without prior knowledge of demonstrator expertise. IRLEED enhances
existing Inverse Reinforcement Learning (IRL) algorithms by combining a general
model for demonstrator suboptimality to address reward bias and action
variance, with a Maximum Entropy IRL framework to efficiently derive the
optimal policy from diverse, suboptimal demonstrations. Experiments in both
online and offline IL settings, with simulated and human-generated data,
demonstrate IRLEED's adaptability and effectiveness, making it a versatile
solution for learning from suboptimal demonstrations.

使用不完美和异构演示在模仿学习中存在相当大的挑战，本文介绍了一种名为 IRLEED 的新框架，通过估计演示者的专业水准，克服了现有逆强化学习算法中对不完善演示的缺陷，并结合最大熵逆强化学习框架从多样的不完善演示中高效地得出最优策略。通过在线和离线模仿学习设置以及模拟和人工生成的数据进行的实验表明，IRLEED 具有适应性和有效性，成为从不完善演示中学习的通用解决方案。