In the field of cooperative multi-agent reinforcement learning (MARL), the standard paradigm is the use of centralised training and decentralised execution where a central critic conditions the policies of the cooperative agents based on a central state. It has been shown, that in cases with large numbers of redundant agents these methods become less effective. In a more general case, there is likely to be a larger number of agents in an environment than is required to solve the task. These redundant agents reduce performance by enlarging the dimensionality of both the state space and and increasing the size of the joint policy used to solve the environment. We propose leveraging layerwise relevance propagation (LRP) to instead separate the learning of the joint value function and generation of local reward signals and create a new MARL algorithm: relevance decomposition network (RDN). We find that although the performance of both baselines VDN and Qmix degrades with the number of redundant agents, RDN is unaffected.

通过使用层次相关传播，我们将联合价值函数的学习与本地奖励信号的生成分开，提出了一个新的合作多智能体增强学习算法：相关分解网络。我们发现，尽管VDN和Qmix的性能会随着冗余智能体数目的增加而降低，但RDN则不受影响。

多智体价值分解中的冗余挑战