BriefGPT.xyz
Dec, 2017
演员-评论家集合学习中的奔跑学习
Learning to Run with Actor-Critic Ensemble
HTML
PDF
Zhewei Huang, Shuchang Zhou, BoEr Zhuang, Xinyu Zhou
TL;DR
介绍了一种名为Actor-Critic Ensemble(ACE)的方法,用于提高Deep Deterministic Policy Gradient(DDPG)算法的性能,此方法在推理时使用批评家集合从多个并行运行的执行者建议中选择最佳动作,以避免具有灾难性后果的动作,并获得了NIPS'17 Learning to Run竞赛的第二名。
Abstract
We introduce an
actor-
critic ensemble
(ACE) method for improving the performance of
deep deterministic policy gradient
(DDPG) algorithm. At
→