We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline
Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms
that rely mainly on conservatism in policy design, DOM2 enhances policy
expressiveness and diversity based on diffusion. Specifically, we incorporate a
diffusion model into the policy network and propose a trajectory-based
data-augmentation scheme in training. These key ingredients make our algorithm
more robust to environment changes and achieve significant improvements in
performance, generalization and data-efficiency. Our extensive experimental
results demonstrate that DOM2 outperforms existing state-of-the-art methods in
multi-agent particle and multi-agent MuJoCo environments, and generalizes
significantly better in shifted environments thanks to its high expressiveness
and diversity. Furthermore, DOM2 shows superior data efficiency and can achieve
state-of-the-art performance with $20+$ times less data compared to existing
algorithms.

本文提出了一种基于扩散的离线多智能体模型（DOM2），采用轨迹数据增广方案，可以应对环境变化，达到更好的性能、泛化能力和数据效率。实验结果表明，DOM2 在多智能体环境中和 shifted environments 下都比现有算法表现更好，并拥有更强的数据效率。