We propose a novel hierarchical reinforcement learning framework for control with continuous state and action spaces. In our framework, the user specifies subgoal regions which are subsets of states; then, we (i) learn options that serve as transitions between these subgoal regions, and (ii) construct a high-level plan in the resulting abstract decision process (ADP). A key challenge is that the ADP may not be Markov, which we address by proposing two algorithms for planning in the ADP. Our first algorithm is conservative, allowing us to prove theoretical guarantees on its performance, which help inform the design of subgoal regions. Our second algorithm is a practical one that interweaves planning at the abstract level and learning at the concrete level. In our experiments, we demonstrate that our approach outperforms state-of-the-art hierarchical reinforcement learning algorithms on several challenging benchmarks.

提出一种新的基于连续状态和动作空间的控制的分层强化学习框架，其中用户指定状态的子集作为子目标区域，然后学习这些子目标区域之间的转换，并在生成的抽象决策过程(ADP)中构建高层计划，通过计划在抽象层和在具体层上的学习相结合的一个实际算法，优于现有的分层强化学习算法。

层次强化学习的抽象值迭代