A large variety of real-world Reinforcement Learning (RL) tasks is characterized by a complex and heterogeneous structure that makes end-to-end (or flat) approaches hardly applicable or even infeasible. Hierarchical Reinforcement Learning (HRL) provides general solutions to address these problems thanks to a convenient multi-level decomposition of the tasks, making their solution accessible. Although often used in practice, few works provide theoretical guarantees to justify this outcome effectively. Thus, it is not yet clear when to prefer such approaches compared to standard flat ones. In this work, we provide an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems. We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure. Then, focusing on a sub-setting of HRL approaches, the options framework, we highlight how the average duration of the available options affects the planning horizon and, consequently, the regret itself. Finally, we relax the assumption of having pre-trained options to show how in particular situations, learning hierarchically from scratch could be preferable to using a standard approach.

本文研究如何使用分层强化学习来解决复杂任务中规划时间过长的问题，并提供了关于时间抽象的上界，指出通过分层结构可以降低时间抽象，提高学习性能。在此基础上，本文重点探讨选项框架下可用选项的平均持续时间对规划时间和遗憾的影响，并放松了预先训练选项的假设来展示在特定情况下，学习式分层学习可能比标准方法更优。

有期半马尔科夫决策过程中基于期权的遗憾最小化算法分析