We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.

本文介绍了第一个大规模分布式深度强化学习的架构，使用Parallel Actors、Parallel Learners、分布式神经网络和分布式体验存储等四个主要组件，在 Atari 2600 游戏中应用 Deep Q-Network 算法，获得了 41 个游戏的超越性能，并在大多数游戏中缩短了达成这些结果所需的时间。

深度强化学习的大规模并行方法