BriefGPT.xyz
Dec, 2020
策略梯度寻找二阶稳定点的样本复杂度
Sample Complexity of Policy Gradient Finding Second-Order Stationary Points
HTML
PDF
Long Yang, Qian Zheng, Gang Pan
TL;DR
本研究提出一种基于强化学习的优化方法,并使用二阶导数的技术证明了其收敛到二阶稳定点,从而避免了算法陷入鞍点或局部最小值。
Abstract
The goal of policy-based
reinforcement learning
(RL) is to search the maximal point of its objective. However, due to the inherent non-concavity of its objective, convergence to a first-order stationary point (FOSP) can not guarantee the
→