Imitation learning (IL) has proven to be an effective method for learning good policies from expert demonstrations. Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed. This paper explores the theoretical underpinnings of AIL in this context, where the stochastic and uncertain nature of environment transitions presents a challenge. We examine the expert sample complexity and interaction complexity required to recover good policies. To this end, we establish a framework connecting reward-free exploration and AIL, and propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of $\widetilde{O} (H^{3/2} |S|/\varepsilon)$ and interaction complexity of $\widetilde{O} (H^{3} |S|^2 |A|/\varepsilon^2)$. Here, $H$ represents the planning horizon, $|S|$ is the state space size, $|A|$ is the action space size, and $\varepsilon$ is the desired imitation gap. MB-TAIL is the first algorithm to achieve this level of expert sample complexity in the unknown transition setting and improves upon the interaction complexity of the best-known algorithm, OAL, by $O(H)$. Additionally, we demonstrate the generalization ability of MB-TAIL by extending it to the function approximation setting and proving that it can achieve expert sample and interaction complexity independent of $|S|$

本文旨在探究在不确定转移条件下对对手模仿学习策略的理论基础。我们提出一种算法MB-TAIL，它可以达到最小最大优化专家样本复杂度并改善了当前已知最佳算法OAL的交互复杂度。此外，我们通过将其扩展到函数逼近设置来展示MB-TAIL的泛化能力，并证明它可以实现独立于状态空间大小的专家样本和交互复杂度。

具有未知转换的可证明高效的对抗性模仿学习