BriefGPT.xyz
Jul, 2020
有限 MDP 的策略梯度方法线性收敛性
A Note on the Linear Convergence of Policy Gradient Methods
HTML
PDF
Jalaj Bhandari, Daniel Russo
TL;DR
本文重新审视了策略梯度法在有限状态和动作MDPs中的有限时间分析,并基于与策略迭代的关系展示出许多策略梯度法变体使用大步长成功并达到线性收敛率。
Abstract
We revisit the finite time analysis of
policy gradient methods
in the simplest setting:
finite state
and action problems with a policy class consisting of all
→