BriefGPT.xyz
Oct, 2021
基于动量的策略梯度算法的全局最优收敛性
On the Global Convergence of Momentum-based Policy Gradient
HTML
PDF
Yuhao Ding, Junzi Zhang, Javad Lavaei
TL;DR
本文研究应用动量项的随机策略梯度方法的全局收敛性,并展示了在softmax和非退化Fisher策略参数化中增加动量项可以提高PG方法的全局最优采样复杂度。此外,作者提供了分析随机PG方法全局收敛速率的通用框架。
Abstract
policy gradient
(PG) methods are popular and efficient for large-scale
reinforcement learning
due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led t
→