Diversity plays a crucial role in improving the performance of multi-agent
reinforcement learning (MARL). Currently, many diversity-based methods have
been developed to overcome the drawbacks of excessive parameter sharing in
traditional MARL. However, there remains a lack of a general metric to quantify
policy differences among agents. Such a metric would not only facilitate the
evaluation of the diversity evolution in multi-agent systems, but also provide
guidance for the design of diversity-based MARL algorithms. In this paper, we
propose the multi-agent policy distance (MAPD), a general tool for measuring
policy differences in MARL. By learning the conditional representations of
agents' decisions, MAPD can computes the policy distance between any pair of
agents. Furthermore, we extend MAPD to a customizable version, which can
quantify differences among agent policies on specified aspects. Based on the
online deployment of MAPD, we design a multi-agent dynamic parameter sharing
(MADPS) algorithm as an example of the MAPD's applications. Extensive
experiments demonstrate that our method is effective in measuring differences
in agent policies and specific behavioral tendencies. Moreover, in comparison
to other methods of parameter sharing, MADPS exhibits superior performance.

多样性在提高多智能体强化学习 (MARL) 性能方面起着关键作用，本文提出了一种通用的量化智能体政策差异的工具，多智能体政策距离 (MAPD)，并通过在线部署设计了一个多智能体动态参数共享 (MADPS) 算法来应用该工具。实验证明，我们的方法在衡量智能体政策差异和特定行为倾向方面是有效的，并且相比其他参数共享方法，MADPS 表现出更优越的性能。