We propose a conceptually simple and lightweight framework for deep
reinforcement learning that uses asynchronous gradient descent for optimization
of deep neural network controllers. We present asynchronous variants of four
standard reinforcement learning algorithms and show that parallel
actor-learners have a stabilizing effect on training allowing all four methods
to successfully train neural network controllers. The best performing method,
an asynchronous variant of actor-critic, surpasses the current state-of-the-art
on the Atari domain while training for half the time on a single multi-core CPU
instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds
on a wide variety of continuous motor control problems as well as on a new task
of navigating random 3D mazes using a visual input.

提出一种使用异步梯度下降法优化深度神经网络控制器的深度强化学习框架，演示了四种标准强化学习算法的异步变体，并表明并行 actor-learner 对训练具有稳定作用。其中最佳表现的方法，即 actor-critic 的异步变体，在 Atari 领域超越了现有的最佳表现，并且仅在单个多核 CPU 上训练一半的时间而不是 GPU。此外，还演示了异步 actor-critic 成功处理了各种连续运动控制问题以及使用视觉输入导航随机 3D 迷宫的新任务。