BriefGPT.xyz
Jun, 2019
策略梯度方法的全局最优性保证
Global Optimality Guarantees For Policy Gradient Methods
HTML
PDF
Jalaj Bhandari, Daniel Russo
TL;DR
该研究探讨了结构性特征对于使得Policy gradients methods有权达到最优点的影响,并且当这些条件变强时,可以证明其满足Polyak-lojasiewicz条件从而有较快的收敛速度。
Abstract
policy gradients methods
are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing
stochastic gradient descent
→