深度策略梯度的深入探讨

Nov, 2018

Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos...

TL;DR研究了深度策略梯度算法的行为如何反映驱动其发展的概念框架，并提出了对最先进方法的细粒度分析。结果表明，深度策略梯度算法的行为经常偏离其驱动框架所预测的行为，这表明了我们对当前方法的了解不足，并提示需要超越当前基准中心的评估方法。

Abstract

We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. We propose a fine-grained analysis of state-of-the-art methods based on key aspects of this framework: →