We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which simply requires that each learner has small regret compared to a $(1+\epsilon)$-multiplicative approximation to the best action in hindsight, is ubiquitous among learning algorithms - it is satisfied even by the vanilla Hedge forecaster. Our results improve upon recent work of Syrgkanis et al. [SALS15] in a number of ways. We improve upon the speed of convergence by a factor of n, the number of players, and require only that the players observe payoffs under other players' realized actions, as opposed to expected payoffs. We further show that convergence occurs with high probability, and under certain conditions show convergence under bandit feedback. Both the scope of settings and the class of algorithms for which our analysis provides fast convergence are considerably broader than in previous work. Our framework applies to dynamic population games via a low approximate regret property for shifting experts. Here we strengthen the results of Lykouris et al. [LST16] in two ways: We allow players to select learning algorithms from a larger class, which includes a minor variant of the basic Hedge algorithm, and we increase the maximum churn in players for which approximate optimality is achieved. In the bandit setting we present a novel algorithm which provides a "small loss"-type bound with improved dependence on the number of actions and is both simple and efficient. This result may be of independent interest.

本论文证明具有低拟近似遗憾性质的学习算法在大类重复博弈中具有快速收敛到近似最优解的能力，包括使用基本对冲算法的算法。此外，作者对之前的结果进行了优化，并将该框架应用于动态人口博弈，并在大小和时间复杂度方面取得了改进。作者还提出了一种新的算法用于泊松回报任务，在效率和小损失方面都更有吸引力。

博弈中的学习: 快速收敛的稳健性