We introduce a novel approach to hierarchical reinforcement learning for Linearly-solvable Markov Decision Processes (LMDPs) in the infinite-horizon average-reward setting. Unlike previous work, our approach allows learning low-level and high-level tasks simultaneously, without imposing limiting restrictions on the low-level tasks. Our method relies on partitions of the state space that create smaller subtasks that are easier to solve, and the equivalence between such partitions to learn more efficiently. We then exploit the compositionality of low-level tasks to exactly represent the value function of the high-level task. Experiments show that our approach can outperform flat average-reward reinforcement learning by one or several orders of magnitude.

我们提出了一种新颖的层次强化学习方法，针对无限时域平均奖励设置中的线性可解决的马尔可夫决策过程（LMDPs）。与以往的工作不同，我们的方法允许同时学习低级和高级任务，而不对低级任务施加限制。我们的方法依赖于创造较小子任务的状态空间分割，并利用这种分割的等价性以实现更高效的学习。然后，我们利用低级任务的组合性来准确表示高级任务的价值函数。实验表明，我们的方法可以比平坦的平均奖励强化学习高出一到几个数量级。

分层平均奖励线性可解的马尔可夫决策过程