BriefGPT.xyz
Feb, 2021
无限时段竞争马尔可夫博弈中分散乐观梯度下降/上升的最后迭代收敛
Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games
HTML
PDF
Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo
TL;DR
研究无穷时间折扣二人零和马尔可夫博弈,开发了一种分散算法,自我对弈时能够收敛到Nash均衡点。
Abstract
We study infinite-horizon discounted two-player zero-sum
markov games
, and develop a
decentralized algorithm
that provably converges to the set of
→