Multi-agent simulations provide a scalable environment for learning policies that interact with rational agents. However, such policies may fail to generalize to the real-world where agents may differ from simulated counterparts due to unmodeled irrationality and misspecified reward functions. We introduce Epsilon-Robust Multi-Agent Simulation (ERMAS), a robust optimization framework for learning AI policies that are robust to such multiagent sim-to-real gaps. While existing notions of multi-agent robustness concern perturbations in the actions of agents, we address a novel robustness objective concerning perturbations in the reward functions of agents. ERMAS provides this robustness by anticipating suboptimal behaviors from other agents, formalized as the worst-case epsilon-equilibrium. We show empirically that ERMAS yields robust policies for repeated bimatrix games and optimal taxation problems in economic simulations. In particular, in the two-level RL problem posed by the AI Economist (Zheng et al., 2020) ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complex spatiotemporal simulations.

本文研究在多智能体博弈中，如何利用强化学习训练一个负责主导的智能体，并提出了能够在多项式时间内识别最坏情况响应的无懊悔动态方法，以提高该主导者策略的鲁棒性，且该方法可扩展为考虑有限理性的智能体。其应用之一为自动机制设计，本文实验结果展示了该方法学习出了在矩阵游戏和复杂时空游戏中的鲁棒机制。

学习针对多个被有限理性代理人的一般和游戏