We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile algorithm for offline imitation learning (IL) via state-occupancy matching. We show that the SMODICE objective admits a simple optimization procedure through an application of Fenchel duality and an analytic solution in tabular MDPs. Without requiring access to expert actions, SMODICE can be effectively applied to three offline IL settings: (i) imitation from observations (IfO), (ii) IfO with dynamics or morphologically mismatched expert, and (iii) example-based reinforcement learning, which we show can be formulated as a state-occupancy matching problem. We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art.

这篇论文提出了一种新颖的离线模仿学习算法 SMODICE，它是一种基于回归的算法，通过状态占据匹配得到，并且能够有效地应用于三种离线模仿学习设置：从观测模仿、动态或形态不匹配的模仿以及基于示例的强化学习。研究者通过 Fenchel 对偶和解析解在表格 MDPs 中优化了 SMODICE 目标。同时，作者也在小环境和高维度的离线基准环境上进行了广泛评估，并表明 SMODICE 对所有三种问题设置都有效，且明显优于之前的技术水平。

基于正则化状态占据匹配的多功能离线从观测和示例中模仿