To obtain the optimal constraints in complex environments, Inverse Constrained Reinforcement Learning (ICRL) seeks to recover these constraints from expert demonstrations in a data-driven manner. Existing ICRL algorithms collect training samples from an interactive environment. However, the efficacy and efficiency of these sampling strategies remain unknown. To bridge this gap, we introduce a strategic exploration framework with guaranteed efficiency. Specifically, we define a feasible constraint set for ICRL problems and investigate how expert policy and environmental dynamics influence the optimality of constraints. Motivated by our findings, we propose two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimation and 2) strategically constraining the exploration policy. Both algorithms are theoretically grounded with tractable sample complexity. We empirically demonstrate the performance of our algorithms under various environments.

本文解决了逆约束强化学习（ICRL）中现有采样策略效率未知的问题。提出了一种具有保证效率的探索框架，并提出了两种算法，通过动态减少成本估计的有界聚合误差和战略性约束探索策略，来实现有效的约束推断。实验结果显示，这些算法在多种环境下表现优越。

可证明有效的逆约束强化学习中的探索