We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task involving finding rewards in random 3D mazes using a visual input.

提出一种使用异步梯度下降法优化深度神经网络控制器的深度强化学习框架，演示了四种标准强化学习算法的异步变体，并表明并行actor-learner对训练具有稳定作用。其中最佳表现的方法，即actor-critic的异步变体，在Atari领域超越了现有的最佳表现，并且仅在单个多核CPU上训练一半的时间而不是GPU。此外，还演示了异步actor-critic成功处理了各种连续运动控制问题以及使用视觉输入导航随机3D迷宫的新任务。

深度强化学习的异步方法