The availability of challenging benchmarks has played a key role in the
recent progress of machine learning. In cooperative multi-agent reinforcement
learning, the StarCraft Multi-Agent Challenge (SMAC) has become a popular
testbed for centralised training with decentralised execution. However, after
years of sustained improvement on SMAC, algorithms now achieve near-perfect
performance. In this work, we conduct new analysis demonstrating that SMAC is
not sufficiently stochastic to require complex closed-loop policies. In
particular, we show that an open-loop policy conditioned only on the timestep
can achieve non-trivial win rates for many SMAC scenarios. To address this
limitation, we introduce SMACv2, a new version of the benchmark where scenarios
are procedurally generated and require agents to generalise to previously
unseen settings (from the same distribution) during evaluation. We show that
these changes ensure the benchmark requires the use of closed-loop policies. We
evaluate state-of-the-art algorithms on SMACv2 and show that it presents
significant challenges not present in the original benchmark. Our analysis
illustrates that SMACv2 addresses the discovered deficiencies of SMAC and can
help benchmark the next generation of MARL methods. Videos of training are
available at this https URL

通过引入新版本的基准测试 SMACv2，可以解决 SMAC 不足的问题并促进多智能体强化学习 (MARL) 算法的发展。