This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the suboptimality gap, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an $\epsilon$-Nash Equilibrium (NE) within $\mathcal{O}(1/\epsilon)$ iterations. This improves upon the previous best result of $\mathcal{O}(1/\epsilon^2)$ iterations and is of the same order, $\mathcal{O}(1/\epsilon)$, that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.

该研究使用独立自然策略梯度算法解决马尔科夫潜在博弈中的多智能体强化学习问题，证明了在引入次优间隙的情况下，使用具有提供精确策略评估的正交算子的独立自然策略梯度方法可以渐进地在Ε-Nash均衡中达到Ο(1/Ε)次迭代，这比之前的结果Ο(1/Ε^2)次迭代要好，并且与单智能体的情况相同，其可达到Ο(1/Ε)次迭代的阶数。通过合成潜在博弈和拥塞博弈的实证结果来验证理论上的界限。

马尔可夫势博弈的独立自然策略梯度的可证明快速收敛