Training agents in multi-agent competitive games presents significant
challenges due to their intricate nature. These challenges are exacerbated by
dynamics influenced not only by the environment but also by opponents'
strategies. Existing methods often struggle with slow convergence and
instability. To address this, we harness the potential of imitation learning to
comprehend and anticipate opponents' behavior, aiming to mitigate uncertainties
with respect to the game dynamics. Our key contributions include: (i) a new
multi-agent imitation learning model for predicting next moves of the opponents
-- our model works with hidden opponents' actions and local observations; (ii)
a new multi-agent reinforcement learning algorithm that combines our imitation
learning model and policy training into one single training process; and (iii)
extensive experiments in three challenging game environments, including an
advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2).
Experimental results show that our approach achieves superior performance
compared to existing state-of-the-art multi-agent RL algorithms.

我们提出了一种新的多智能体模仿学习模型，用于预测对手的下一步动作，并将其与策略训练结合为一个训练过程的多智能体强化学习算法，在三个具有挑战性的游戏环境中进行了广泛实验，结果表明我们的方法在性能上优于现有的多智能体强化学习算法。

模仿以获胜：多智能竞争游戏中的模仿学习策略

Mimicking To Dominate: Imitation Learning Strategies for Success in  Multiagent Competitive Games

We introduce a class of networked Markov potential games where agents are
associated with nodes in a network. Each agent has its own local potential
function, and the reward of each agent depends only on the states and actions
of agents within a $\kappa$-hop neighborhood. In this context, we propose a
localized actor-critic algorithm. The algorithm is scalable since each agent
uses only local information and does not need access to the global state.
Further, the algorithm overcomes the curse of dimensionality through the use of
function approximation. Our main results provide finite-sample guarantees up to
a localization error and a function approximation error. Specifically, we
achieve an $\tilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity measured by
the averaged Nash regret. This is the first finite-sample bound for multi-agent
competitive games that does not depend on the number of agents.

本研究提出了一种基于网络结构的马尔可夫潜在博弈模型，以及一种局部演员 - 评论家算法，利用函数逼近方法克服了维度诅咒，并给出了与局部误差和函数逼近误差有关的有限样本保证，实验证明该算法能够有效地处理多智能体竞争博弈问题。