Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the diversity evolution in multi-agent systems, but also provide guidance for the design of diversity-based MARL algorithms. In this paper, we propose the multi-agent policy distance (MAPD), a general tool for measuring policy differences in MARL. By learning the conditional representations of agents' decisions, MAPD can computes the policy distance between any pair of agents. Furthermore, we extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects. Based on the online deployment of MAPD, we design a multi-agent dynamic parameter sharing (MADPS) algorithm as an example of the MAPD's applications. Extensive experiments demonstrate that our method is effective in measuring differences in agent policies and specific behavioral tendencies. Moreover, in comparison to other methods of parameter sharing, MADPS exhibits superior performance.

多样性在提高多智能体强化学习(MARL)性能方面起着关键作用，本文提出了一种通用的量化智能体政策差异的工具，多智能体政策距离(MAPD)，并通过在线部署设计了一个多智能体动态参数共享(MADPS)算法来应用该工具。实验证明，我们的方法在衡量智能体政策差异和特定行为倾向方面是有效的，并且相比其他参数共享方法，MADPS表现出更优越的性能。

多智能体强化学习的策略距离测量