Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand of the minimum set of demonstrations needed to teach a specific sequential decision-making task. We formalize the problem of finding optimal demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing an equivalence to the set cover problem, and use this equivalence to develop an efficient algorithm for determining the set of maximally-informative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: benchmarking active learning IRL algorithms and developing an IRL algorithm that, rather than assuming demonstrations are i.i.d., uses counterfactual reasoning over informative demonstrations to learn more efficiently.

该研究提出了一种基于机器教学的逆强化学习方法，利用最小数量的演示数据来学习策略并提高泛化性能。同时，还发展了一个新的学习方法，在一些应用中可以从信息丰富的演示数据中更加高效地学习到奖励函数。

逆强化学习的机器教学：算法与应用