We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation. In order to find the minimum assumption for sample-efficient learning, we introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. Using this measure, we propose the first unified algorithmic framework that ensures sample efficiency in learning Nash Equilibrium, Coarse Correlated Equilibrium, and Correlated Equilibrium for both model-based and model-free MARL problems with low MADC. We also show that our algorithm provides comparable sublinear regret to the existing works. Moreover, our algorithm combines an equilibrium-solving oracle with a single objective optimization subprocedure that solves for the regularized payoff of each deterministic joint policy, which avoids solving constrained optimization problems within data-dependent constraints (Jin et al. 2020; Wang et al. 2023) or executing sampling procedures with complex multi-objective optimization problems (Foster et al. 2023), thus being more amenable to empirical implementation.

我们研究了多智能体强化学习(MARL)在一般和马尔可夫博弈(MG)下具有一般函数逼近的情况。通过引入一种新颖的复杂度度量，即多智能体解耦系数(MADC)，我们旨在找到基于样本高效学习的最小假设。利用该度量，我们提出了首个统一的算法框架，可以在低MADC的情况下保证在模型为基础和模型无关的MARL问题中学习纳什均衡、粗粒度相关均衡和相关均衡的样本效率性。此外，我们还展示了与现有工作相比，我们的算法提供了可比较的次线性遗憾。此外，我们的算法结合了一个均衡求解器和一个单一目标优化次程序，用于求解每个确定性联合策略的正则化收益，从而避免在数据相关的约束条件下求解约束优化问题(Jin et al. 2020; Wang et al. 2023)，或在复杂的多目标优化问题(Foster et al. 2023)中执行抽样过程，因此更适合于实证实现。

高效的样本有效的多智能体强化学习: 优化视角