Reinforcement learning can train policies that effectively perform complex tasks. However for long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills. Hierarchical reinforcement learning aims to enable this by providing a bank of low-level skills as action abstractions. Hierarchies can further improve on this by abstracting the space states as well. We posit that a suitable state abstraction should depend on the capabilities of the available lower-level policies. We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill. These value functions capture the affordances of the scene, thus forming a representation that compactly abstracts task relevant information and robustly ignores distractors. Empirical evaluations for maze-solving and robotic manipulation tasks demonstrate that our approach improves long-horizon performance and enables better zero-shot generalization than alternative model-free and model-based methods.

该论文探讨了如何使用层次强化学习来解决长期任务中存在的性能问题，并提出了一种名为Value Function Spaces的状态抽象方法，通过利用对应于每个低层技能的价值函数来表示任务相关信息，从而在迷宫解决和机器人操纵等任务中提高了性能及零样本泛化能力。

价值函数空间：面向技能的状态抽象实现长程推理