We present two variants of a multi-agent reinforcement learning algorithm
based on evolutionary game theoretic considerations. The intentional simplicity
of one variant enables us to prove results on its relationship to a system of
ordinary differential equations of replicator-mutator dynamics type, allowing
us to present proofs on the algorithm's convergence conditions in various
settings via its ODE counterpart. The more complicated variant enables
comparisons to Q-learning based algorithms. We compare both variants
experimentally to WoLF-PHC and frequency-adjusted Q-learning on a range of
settings, illustrating cases of increasing dimensionality where our variants
preserve convergence in contrast to more complicated algorithms. The
availability of analytic results provides a degree of transferability of
results as compared to purely empirical case studies, illustrating the general
utility of a dynamical systems perspective on multi-agent reinforcement
learning when addressing questions of convergence and reliable generalisation.

我们提出了基于进化博弈理论考虑的两种多智能体强化学习算法的变体。 一个变体的有意简化使我们能够证明它与一类常微分方程系统的复制子 - 变异体动力学的关系，从而通过它的常微分方程对应项在各种环境中展示了该算法的收敛条件。相较于更复杂的算法，另一个更复杂的变体允许与 Q 学习算法进行比较。我们在一系列环境中通过实验将这两个变体与 WoLF-PHC 和频率调整的 Q 学习进行比较，展示了我们的变体在维度增加的情况下保持收敛性的实例与更复杂算法的对比。解析结果的可用性相对于纯经验案例研究提供了一定的可转移性，展示了在处理收敛性和可靠的推广问题时，动力系统视角对多智能体强化学习的普适性。

游戏中的突变偏好学习

Mutation-Bias Learning in Games

Minimax optimization problems have attracted a lot of attention over the past
few years, with applications ranging from economics to machine learning. While
advanced optimization methods exist for such problems, characterizing their
dynamics in stochastic scenarios remains notably challenging. In this paper, we
pioneer the use of stochastic differential equations (SDEs) to analyze and
compare Minimax optimizers. Our SDE models for Stochastic Gradient
Descent-Ascent, Stochastic Extragradient, and Stochastic Hamiltonian Gradient
Descent are provable approximations of their algorithmic counterparts, clearly
showcasing the interplay between hyperparameters, implicit regularization, and
implicit curvature-induced noise. This perspective also allows for a unified
and simplified analysis strategy based on the principles of It\^o calculus.
Finally, our approach facilitates the derivation of convergence conditions and
closed-form solutions for the dynamics in simplified settings, unveiling
further insights into the behavior of different optimizers.

利用随机微分方程分析和比较最小化最大化优化器的 SDE 模型，揭示超参数、隐式正则化和隐含的曲率诱导噪声之间的相互作用，并以简化的设定推导出收敛条件和闭式解，进一步揭示不同优化器行为的见解。