Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved control problems across a variety of domains. Recently, algorithms have been proposed which combine these two methods, aiming to leverage the strengths and mitigate the weaknesses of both approaches. In this paper we introduce a new Evolutionary Reinforcement Learning model which combines a particular family of Evolutionary algorithm called Evolutionary Strategies with the off-policy Deep Reinforcement Learning algorithm TD3. The framework utilises a multi-buffer system instead of using a single shared replay buffer. The multi-buffer system allows for the Evolutionary Strategy to search freely in the search space of policies, without running the risk of overpopulating the replay buffer with poorly performing trajectories which limit the number of desirable policy behaviour examples thus negatively impacting the potential of the Deep Reinforcement Learning within the shared framework. The proposed algorithm is demonstrated to perform competitively with current Evolutionary Reinforcement Learning algorithms on MuJoCo control tasks, outperforming the well known state-of-the-art CEM-RL on 3 of the 4 environments tested.

本文提出了一种新的进化强化学习模型(Evolutionary Reinforcement Learning)，它将一种名为Evolutionary Strategies的进化算法与离线策略深度强化学习算法TD3结合起来，利用多缓冲区系统而不是单一共享重放缓冲区进行搜索。该算法的具体实现在MuJoCo控制任务上实现了有竞争力的表现，甚至在3个测试环境中胜过了著名的CEM-RL最先进的状态。

多缓冲区通信引导的演化策略强化学习