无限时段竞争马尔可夫博弈中分散乐观梯度下降/上升的最后迭代收敛

Feb, 2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

TL;DR研究无穷时间折扣二人零和马尔可夫博弈，开发了一种分散算法，自我对弈时能够收敛到Nash均衡点。

Abstract

We study infinite-horizon discounted two-player zero-sum markov games, and develop a decentralized algorithm that provably converges to the set of →