In the field of cooperative multi-agent reinforcement learning (MARL), the
standard paradigm is the use of centralised training and decentralised
execution where a central critic conditions the policies of the cooperative
agents based on a central state. It has been shown, that in cases with large
numbers of redundant agents these methods become less effective. In a more
general case, there is likely to be a larger number of agents in an environment
than is required to solve the task. These redundant agents reduce performance
by enlarging the dimensionality of both the state space and and increasing the
size of the joint policy used to solve the environment. We propose leveraging
layerwise relevance propagation (LRP) to instead separate the learning of the
joint value function and generation of local reward signals and create a new
MARL algorithm: relevance decomposition network (RDN). We find that although
the performance of both baselines VDN and Qmix degrades with the number of
redundant agents, RDN is unaffected.

通过使用层次相关传播，我们将联合价值函数的学习与本地奖励信号的生成分开，提出了一个新的合作多智能体增强学习算法：相关分解网络。我们发现，尽管 VDN 和 Qmix 的性能会随着冗余智能体数目的增加而降低，但 RDN 则不受影响。

多智体价值分解中的冗余挑战

The challenge of redundancy on multi-agent value factorisation

This paper introduces four new algorithms that can be used for tackling
multi-agent reinforcement learning (MARL) problems occurring in cooperative
settings. All algorithms are based on the Deep Quality-Value (DQV) family of
algorithms, a set of techniques that have proven to be successful when dealing
with single-agent reinforcement learning problems (SARL). The key idea of DQV
algorithms is to jointly learn an approximation of the state-value function
$V$, alongside an approximation of the state-action value function $Q$. We
follow this principle and generalise these algorithms by introducing two fully
decentralised MARL algorithms (IQV and IQV-Max) and two algorithms that are
based on the centralised training with decentralised execution training
paradigm (QVMix and QVMix-Max). We compare our algorithms with state-of-the-art
MARL techniques on the popular StarCraft Multi-Agent Challenge (SMAC)
environment. We show competitive results when QVMix and QVMix-Max are compared
to well-known MARL techniques such as QMIX and MAVEN and show that QVMix can
even outperform them on some of the tested environments, being the algorithm
which performs best overall. We hypothesise that this is due to the fact that
QVMix suffers less from the overestimation bias of the $Q$ function.

本文介绍了四种新算法：IQV，IQV-Max，QVMix 和 QVMix-Max，用于解决协作环境下发生的多智能体强化学习（MARL）问题。作者比较了这些算法和现有的 MARL 技术，并表明 QVMix 在测试环境中表现最佳，其优于其他算法的原因在于其 $Q$ 函数的过高估计偏见相对较低。