This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.

本文提出了一种递归推理形式化方法，即基于递归推理的贝叶斯优化，用于建模在重复博弈中，自利的有限理性代理与具有未知、复杂且昂贵的收益函数之间的互动过程。通过在比其他代理更高的二级或更高水平上推理，我们的递归推理方法可以实现更快的渐近收敛，我们的算法在合成游戏、对抗式机器学习和多代理强化学习中的实验也展示了其性能和通用性。

R2-B2：基于递归推理的贝叶斯优化，用于非后悔学习博弈论