Extending deep Q-learning to cooperative multi-agent settings is challenging due to the exponential growth of the joint action space, the non-stationary environment, and the credit assignment problem. Value decomposition allows deep Q-learning to be applied at the joint agent level, at the cost of reduced expressivity. Building on past work in this direction, our paper proposes PairVDN, a novel method for decomposing the value function into a collection of pair-wise, rather than per-agent, functions, improving expressivity at the cost of requiring a more complex (but still efficient) dynamic programming maximisation algorithm. Our method enables the representation of value functions which cannot be expressed as a monotonic combination of per-agent functions, unlike past approaches such as VDN and QMIX. We implement a novel many-agent cooperative environment, Box Jump, and demonstrate improved performance over these baselines in this setting. We open-source our code and environment at https://github.com/zzbuzzard/PairVDN.

本研究解决了在合作多智能体设置中，深度Q学习面临的复杂联合行动空间和信用分配问题的局限性。我们提出了一种新的值函数分解方法PairVDN，通过成对的方式而非单独智能体的方式，提高了表现力，展现了超过传统VDN和QMIX的性能提升。此方法对复杂动态规划最大化算法的要求更高，但在实验环境Box Jump中取得了显著的改进成果。

PairVDN - 成对分解价值函数