One of the main challenges in reinforcement learning (RL) is generalisation.
In typical deep RL methods this is achieved by approximating the optimal value
function with a low-dimensional representation using a deep network. While this
approach works well in many domains, in domains where the optimal value
function cannot easily be reduced to a low-dimensional representation, learning
can be very slow and unstable. This paper contributes towards tackling such
challenging domains, by proposing a new method, called Hybrid Reward
Architecture (HRA). HRA takes as input a decomposed reward function and learns
a separate value function for each component reward function. Because each
component typically only depends on a subset of all features, the corresponding
value function can be approximated more easily by a low-dimensional
representation, enabling more effective learning. We demonstrate HRA on a
toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human
performance.

本文介绍了一种新的强化学习方法 —— 混合奖励架构（HRA），通过利用分解奖励函数并为每个组成部分学习单独的价值函数来实现应对价值函数无法轻易降维的领域的挑战。在获得 Ms. Pac-Man 游戏高于人类成绩的优异表现后，证明了 HRA 在玩具问题和 Atari 游戏 Ms. Pac-Man 上的有效性。