This paper presents a general mean-field game (GMFG) framework for
simultaneous learning and decision-making in stochastic games with a large
population. It first establishes the existence of a unique Nash Equilibrium to
this GMFG, and explains that naively combining Q-learning with the fixed-point
approach in classical MFGs yields unstable algorithms. It then proposes a
Q-learning algorithm with Boltzmann policy (GMF-Q), with analysis of
convergence property and computational complexity. The experiments on repeated
Ad auction problems demonstrate that this GMF-Q algorithm is efficient and
robust in terms of convergence and learning accuracy. Moreover, its performance
is superior in convergence, stability, and learning ability, when compared with
existing algorithms for multi-agent reinforcement learning.

该论文提出了一个通用平均场博弈（GMFG）框架，用于解决具有大量人口的随机博弈中的学习和决策问题。它提出了一种使用玻尔兹曼策略（GMF-Q）的 Q 学习算法，并进行了收敛性质和计算复杂度的分析。实验表明该 GMF-Q 算法在收敛性和学习精度方面高效稳健，比现有的多智能体强化学习算法具有更好的性能。