Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizontal tasks from expert demonstrations by modeling the task hierarchy with the option framework. Existing methods either overlook the causal relationship between the subtask and its corresponding policy or fail to learn the policy in an end-to-end fashion, which leads to suboptimality. In this work, we develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning and adapt it with the Expectation-Maximization algorithm in order to directly recover a hierarchical policy from the unannotated demonstrations. Further, we introduce a directed information term to the objective function to enhance the causality and propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion. Theoretical justifications and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.

提出了一种多任务分层对抗逆强化学习方法(MH-AIRL),用于训练具有分层结构的多任务策略，以提高复合任务的表现，增强对复杂、长周期任务的训练效率，降低数据需求以及提高对专家演示的利用效率。实验证明，与现有算法相比，MH-AIRL表现更优。

基于上下文的多任务分层逆强化学习算法