Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learners, centralised training with decentralised execution, and value decomposition) in a diverse range of multi-agent learning tasks. Our results show that (1) algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks; (2) independent learners often achieve equal or better performance than more complex algorithms; (3) tested algorithms struggle to solve multi-agent tasks with sparse rewards. We report detailed empirical data, including a reliability analysis, and provide insights into the limitations of the tested algorithms.

本研究提供一个系统化的评估来比较三种不同类别的多智能体深度强化学习算法（独立学习、集中式多智能体策略梯度、价值分解）在多样化的合作多智能体学习任务中的表现，为算法在不同学习任务中的预期性能提供参考，并提供了有关不同学习方法有效性的见解。我们开源了EPyMARL，延伸了PyMARL代码库以包括其他算法，并允许对算法实现细节进行灵活配置，例如参数共享。最后，我们还开源了两个多智能体研究的环境，重点是在稀疏奖励下的协调。

合作任务中的多智能体深度强化学习算法评估