演员-评论家集合学习中的奔跑学习

Dec, 2017

演员-评论家集合学习中的奔跑学习

Learning to Run with Actor-Critic Ensemble

Zhewei Huang, Shuchang Zhou, BoEr Zhuang, Xinyu Zhou

TL;DR介绍了一种名为Actor-Critic Ensemble（ACE）的方法，用于提高Deep Deterministic Policy Gradient（DDPG）算法的性能，此方法在推理时使用批评家集合从多个并行运行的执行者建议中选择最佳动作，以避免具有灾难性后果的动作，并获得了NIPS'17 Learning to Run竞赛的第二名。

Abstract

We introduce an actor-critic ensemble(ACE) method for improving the performance of deep deterministic policy gradient(DDPG) algorithm. At