Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of user interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution. Experiments on resource allocation, Ising model estimation, and battle game tasks verify the learning effectiveness of our mean field approaches in handling many-agent interactions in population.

本文介绍了平均场强化学习方法，通过该方法可以近似处理不同智能体之间的互动，同时开发了多个实际的基于Q-learning和Actor-Critic的平均场算法模型，并分析了解决纳什均衡的收敛性，在高斯挤压、伊辛模型和博弈游戏等实验中验证了本方法的有效性。同时，作者报告了使用无模型的强化学习方法成功解决了伊辛模型问题。

均场多智能体强化学习