Hadamard 参数化下策略梯度的线性收敛

May, 2023

Hadamard 参数化下策略梯度的线性收敛

On the Linear Convergence of Policy Gradient under Hadamard Parameterization

Jiacai Liu, Jinchi Chen, Ke Wei

TL;DR研究了基于Hadamard参数化的确定性策略梯度在表格设置下的收敛性，证明了算法的全局线性收敛性，并且在此基础上，进一步表明该算法在 $k_0$ 次迭代后具有更快的局部线性收敛速率，其中 $k_0$ 是仅依赖于马尔可夫决策过程问题和步长的常数。总体而言，算法显示出对于所有迭代都具有较宽松常数的线性收敛速率。

Abstract

The convergence of deterministic policy gradient under the hadamard parametrization is studied in the tabular setting and the global linear conve