Learning goal-directed behavior in environments with sparse feedback is a major challenge for reinforcement learning algorithms. The primary difficulty arises due to insufficient exploration, resulting in an agent being unable to learn robust value functions. Intrinsically motivated agents can explore new behavior for its own sake rather than to directly solve problems. Such intrinsic behaviors could eventually help the agent solve tasks posed by the environment. We present hierarchical-DQN (h-DQN), a framework to integrate hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning. A top-level value function learns a policy over intrinsic goals, and a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments. We demonstrate the strength of our approach on two problems with very sparse, delayed feedback: (1) a complex discrete MDP with stochastic transitions, and (2) the classic ATARI game `Montezuma's Revenge'.

文章介绍了一种名为Hierarchical-DQN的框架，结合了分层的值函数、内在动机和深度强化学习，在稀疏反馈的环境中，Hierarchical-DQN可以提供灵活的目标规定和高效的探索，通过在两个问题上的实验表明该方法的有效性。

分层深度强化学习：整合时间抽象和内在动机