自然策略梯度法在对数-线性策略下的线性收敛

Oct, 2022

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao

TL;DR本研究考虑了无限期折扣马尔可夫决策过程，并研究了自然策略梯度和Q-NPG方法在对数线性策略类下的收敛速度及样本复杂性，其在非自适应几何递增步长下可以实现线性收敛率和样本复杂度的约为O(1/epsilon^2)。

Abstract

We consider infinite-horizon discounted markov decision processes and study the convergence rates of the natural policy gradient (NPG) and