This paper studies generalized inverse reinforcement learning (GIRL) in Markov decision processes (MDPs), that is, the problem of learning the basic components of an MDP given observed behavior (policy) that might not be optimal. These components include not only the reward function and transition probability matrices, but also the action space and state space that are not exactly known but are known to belong to given uncertainty sets. We address two key challenges in GIRL: first, the need to quantify the discrepancy between the observed policy and the underlying optimal policy; second, the difficulty of mathematically characterizing the underlying optimal policy when the basic components of an MDP are unobservable or partially observable. Then, we propose the mathematical formulation for GIRL and develop a fast heuristic algorithm. Numerical results on both finite and infinite state problems show the merit of our formulation and algorithm.

这篇论文研究了马尔可夫决策过程中的广义逆强化学习(GIRL)，即通过观察到的行为(策略)来学习马尔可夫决策过程的基本组成部分，这些组成部分可能不是最佳的。我们解决了GIRL中的两个关键挑战：首先，需要量化观察到的策略与基本的最优策略之间的差异；其次，在基本的马尔可夫决策过程组成部分不可观察或部分可观察时，对基本的最优策略进行数学描述的困难。然后，我们提出了GIRL的数学形式，并开发了一种快速的启发式算法。有限状态和无限状态问题的数值结果显示了我们的形式化方法和算法的优点。

通向广义逆强化学习