Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms. Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM$^+$. First, we show numerically that several practical variants such as simultaneous RM$^+$, alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ game. We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence: we prove that extragradient RM$^{+}$ and smooth Predictive RM$^+$ enjoy asymptotic last-iterate convergence (without a rate) and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.

研究了基于遗憾匹配（RM+）及其变种的算法在求解大规模两人零和博弈中的最优策略时的迭代收敛性，并通过数值实验证明了部分实际变种算法在简单的3×3游戏中无法保证迭代收敛。进一步证明了基于平滑技术的最新变种算法，如extragradient RM+ 和 smooth Predictive RM+ 在最优策略上存在渐进收敛以及1/√t的最优策略收敛。最后，引入了重启变种算法，并证明它们在最优策略上可达到线性级别的收敛速度。

遗憾匹配算法在博弈中的最后迭代收敛性质