策略梯度方法的全局最优性保证

Jun, 2019

Global Optimality Guarantees For Policy Gradient Methods

Jalaj Bhandari, Daniel Russo

TL;DR该研究探讨了结构性特征对于使得Policy gradients methods有权达到最优点的影响，并且当这些条件变强时，可以证明其满足Polyak-lojasiewicz条件从而有较快的收敛速度。

Abstract

policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent