BriefGPT.xyz
Jan, 2022
均场极限下带有熵正则化的MDPs策略梯度与神经网络逼近的收敛性
Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime
HTML
PDF
Bekzhan Kerimkulov, James-Michael Leahy, David Šiška, Lukasz Szpruch
TL;DR
本文研究了策略梯度在无限时间,连续状态和动作空间,及熵正则化的马尔可夫决策过程中的全局收敛性,并证明了在符合足够正则化的情况下,梯度流指数级收敛到唯一的稳态解。
Abstract
We study the
global convergence
of
policy gradient
for infinite-horizon, continuous state and action space, entropy-regularized
markov decision p
→