Diversity plays a crucial role in improving the performance of multi-agent
reinforcement learning (MARL). Currently, many diversity-based methods have
been developed to overcome the drawbacks of excessive parameter sharing in
traditional MARL. However, there remains a lack of a general metric to quantify
policy differences among agents. Such a metric would not only facilitate the
evaluation of the diversity evolution in multi-agent systems, but also provide
guidance for the design of diversity-based MARL algorithms. In this paper, we
propose the multi-agent policy distance (MAPD), a general tool for measuring
policy differences in MARL. By learning the conditional representations of
agents' decisions, MAPD can computes the policy distance between any pair of
agents. Furthermore, we extend MAPD to a customizable version, which can
quantify differences among agent policies on specified aspects. Based on the
online deployment of MAPD, we design a multi-agent dynamic parameter sharing
(MADPS) algorithm as an example of the MAPD's applications. Extensive
experiments demonstrate that our method is effective in measuring differences
in agent policies and specific behavioral tendencies. Moreover, in comparison
to other methods of parameter sharing, MADPS exhibits superior performance.

多样性在提高多智能体强化学习 (MARL) 性能方面起着关键作用，本文提出了一种通用的量化智能体政策差异的工具，多智能体政策距离 (MAPD)，并通过在线部署设计了一个多智能体动态参数共享 (MADPS) 算法来应用该工具。实验证明，我们的方法在衡量智能体政策差异和特定行为倾向方面是有效的，并且相比其他参数共享方法，MADPS 表现出更优越的性能。

多智能体强化学习的策略距离测量

Measuring Policy Distance for Multi-Agent Reinforcement Learning

We compare policy differences across institutions by embedding
representations of the entire legal corpus of each institution and the
vocabulary shared across all corpora into a continuous vector space. We apply
our method, Gov2Vec, to Supreme Court opinions, Presidential actions, and
official summaries of Congressional bills. The model discerns meaningful
differences between government branches. We also learn representations for more
fine-grained word sources: individual Presidents and (2-year) Congresses. The
similarities between learned representations of Congresses over time and
sitting Presidents are negatively correlated with the bill veto rate, and the
temporal ordering of Presidents and Congresses was implicitly learned from only
text. With the resulting vectors we answer questions such as: how does Obama
and the 113th House differ in addressing climate change and how does this vary
from environmental or economic perspectives? Our work illustrates
vector-arithmetic-based investigations of complex relationships between word
sources based on their texts. We are extending this to create a more
comprehensive legal semantic map.

使用唯一向量空间嵌入法比较不同机构之间的政策差异，发现各机构之间存在有意义的差异，并使用文法向量回答具体问题，正在扩展为更综合的法律语义地图。