Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the learning process and achieve better generalization. In this work, we study the inductive bias and propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration. Experiments on Craft and Dial demonstrate that our modelcan achieve higher task decomposition performance under both unsupervised and weakly supervised settings, comparing with strong baselines. OMPN can also bedirectly applied to partially observable environments and still achieve higher task decomposition performance. Our visualization further confirms that the subtask hierarchy can emerge in our model.

本文提出了一种名为OMPN的有序记忆策略网络，用于通过学习人类演示中的层次结构发现子任务层次结构，进而通过任务分解恢复无结构演示中的子任务边界。实验证明，在无监督和弱监督设置下，OMPN模型可以比强基线模型更好地实现任务分解。而且，OMPN模型也可以直接应用于部分可观察的环境，仍然可以实现更高的任务分解性能。

使用有序记忆策略网络进行学习任务分解