The use of imitation learning to learn a single policy for a complex task that has multiple modes or hierarchical structure can be challenging. In fact, previous work has shown that when the modes are known, learning separate policies for each mode or sub-task can greatly improve the performance of imitation learning. In this work, we discover the interaction between sub-tasks from their resulting state-action trajectory sequences using a directed graphical model. We propose a new algorithm based on the generative adversarial imitation learning framework which automatically learns sub-task policies from unsegmented demonstrations. Our approach maximizes the directed information flow in the graphical model between sub-task latent variables and their generated trajectories. We also show how our approach connects with the existing Options framework, which is commonly used to learn hierarchical policies.

本研究提出一种新的算法，它可以使用生成对抗性模仿学习框架，通过图模型来学习未分割演示中的子任务策略，并通过优化图模型中子任务潜在变量和其生成的轨迹之间的有向信息流来提高性能，同时将该方法与现有的层次策略学习框架Options连接起来。

使用有向信息从未经分段的演示中学习分层策略的 Directed-Info GAIL