Hierarchical Reinforcement Learning (HRL) agents have the potential to
demonstrate appealing capabilities such as planning and exploration with
abstraction, transfer, and skill reuse. Recent successes with HRL across
different domains provide evidence that practical, effective HRL agents are
possible, even if existing agents do not yet fully realize the potential of
HRL. Despite these successes, visually complex partially observable 3D
environments remained a challenge for HRL agents. We address this issue with
Hierarchical Hybrid Offline-Online (H2O2), a hierarchical deep reinforcement
learning agent that discovers and learns to use options from scratch using its
own experience. We show that H2O2 is competitive with a strong non-hierarchical
Muesli baseline in the DeepMind Hard Eight tasks and we shed new light on the
problem of learning hierarchical agents in complex environments. Our empirical
study of H2O2 reveals previously unnoticed practical challenges and brings new
perspective to the current understanding of hierarchical agents in complex
domains.

使用层次混合离线 - 在线的深度强化学习代理提出了一种解决 HRL 代理在可视复杂部分可观察 3D 环境中的问题的方法，并在 DeepMind Hard Eight 任务中与非分层 Muesli 基线相竞争，研究揭示了以前未注意到的实际挑战，并为了解复杂领域中的层次代理提供了新的视角。