BriefGPT.xyz
May, 2021
自然策略梯度算法的线性收敛性
On the Linear convergence of Natural Policy Gradient Algorithm
HTML
PDF
Sajad Khodadadian, Prakirt Raj Jhunjhunwala, Sushil Mahavir Varma, Siva Theja Maguluri
TL;DR
本文研究了应用于马尔可夫决策过程中的自然策略梯度算法,在此基础上提出具有自适应步长的改进方法,并通过实验比较不同变种的策略梯度方法。
Abstract
markov decision processes
are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in
reinforcement learning
has motivated the study of methods inspired by optimization, such
→