In multi-agent reinforcement learning (MARL), self-interested agents attempt to establish equilibrium and achieve coordination depending on game structure. However, existing MARL approaches are mostly bound by the simultaneous actions of all agents in the Markov game (MG) framework, and few works consider the formation of equilibrium strategies via asynchronous action coordination. In view of the advantages of Stackelberg equilibrium (SE) over Nash equilibrium, we construct a spatio-temporal sequential decision-making structure derived from the MG and propose an N-level policy model based on a conditional hypernetwork shared by all agents. This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents. Agents can learn heterogeneous SE policies while still maintaining parameter sharing, which leads to reduced cost for learning and storage and enhanced scalability as the number of agents increases. Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios, and performs admirably in immensely complex settings including cooperative tasks and mixed tasks.

该论文提出了一种基于Stackelberg equilibrium的、具有异步行动协调的N级政策模型，通过共享条件超网络，使智能体可以学习不同的策略而不导致学习成本、存储成本以及扩展性的增加。实验证明，该模型在重复博弈场景中可以成功收敛到Stackelberg equilibrium，对于合作任务和混合任务的完成也表现非常出色。

多智能体强化学习中通过时空顺序决策诱导斯塔克贝格均衡