In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player. Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps. Through carefully balancing the reuse of past samples and exploration of new samples, we then extend the results to Markov potential games and improve the best available Nash regret from $O(T^{5/6})$ to $O(T^{4/5})$. Moreover, our algorithm requires no knowledge of the game, such as the distribution mismatch coefficient, which provides more flexibility in its practical implementation. Experimental results corroborate our theoretical findings and underscore the practical effectiveness of our method.

本研究主要探讨了潜在博弈、马尔可夫潜在博弈和Frank-Wolfe算法在随机成本和强盗反馈下的应用，提出了一种具有足够探索性和递归梯度估计的变种算法，能证明收敛于纳什均衡并对每个参与者实现亚线性遗憾。该算法同时在潜在博弈中实现了纳什遗憾和 $O(T^{4/5})$ 的遗憾上界，匹配了现有最佳结果，无需额外的投影步骤。通过精确平衡过去样本的重复使用和新样本的探索，我们将结果扩展到了马尔可夫潜在博弈中，将现有最佳纳什遗憾从 $O(T^{5/6})$ 改进至 $O(T^{4/5})$。此外，我们的算法不需要了解游戏的任何信息，如分布误差系数，这提供了更灵活的实际实施。实验结果证实了我们的理论发现，并强调了我们方法的实际有效性。

（马尔可夫）潜力博弈中的纳什均衡收敛和无悔保证