In Imitation Learning (IL), utilizing suboptimal and heterogeneous demonstrations presents a substantial challenge due to the varied nature of real-world data. However, standard IL algorithms consider these datasets as homogeneous, thereby inheriting the deficiencies of suboptimal demonstrators. Previous approaches to this issue typically rely on impractical assumptions like high-quality data subsets, confidence rankings, or explicit environmental knowledge. This paper introduces IRLEED, Inverse Reinforcement Learning by Estimating Expertise of Demonstrators, a novel framework that overcomes these hurdles without prior knowledge of demonstrator expertise. IRLEED enhances existing Inverse Reinforcement Learning (IRL) algorithms by combining a general model for demonstrator suboptimality to address reward bias and action variance, with a Maximum Entropy IRL framework to efficiently derive the optimal policy from diverse, suboptimal demonstrations. Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED's adaptability and effectiveness, making it a versatile solution for learning from suboptimal demonstrations.

使用不完美和异构演示在模仿学习中存在相当大的挑战，本文介绍了一种名为IRLEED的新框架，通过估计演示者的专业水准，克服了现有逆强化学习算法中对不完善演示的缺陷，并结合最大熵逆强化学习框架从多样的不完善演示中高效地得出最优策略。通过在线和离线模仿学习设置以及模拟和人工生成的数据进行的实验表明，IRLEED具有适应性和有效性，成为从不完善演示中学习的通用解决方案。

通过估计演示者的专业知识进行逆强化学习