Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in
safety-critical scenarios, thus the analysis of robustness for c-MARL models is
profoundly important. However, robustness certification for c-MARLs has not yet
been explored in the community. In this paper, we propose a novel certification
method, which is the first work to leverage a scalable approach for c-MARLs to
determine actions with guaranteed certified bounds. c-MARL certification poses
two key challenges compared with single-agent systems: (i) the accumulated
uncertainty as the number of agents increases; (ii) the potential lack of
impact when changing the action of a single agent into a global team reward.
These challenges prevent us from directly using existing algorithms. Hence, we
employ the false discovery rate (FDR) controlling procedure considering the
importance of each agent to certify per-state robustness and propose a
tree-search-based algorithm to find a lower bound of the global reward under
the minimal certified perturbation. As our method is general, it can also be
applied in single-agent environments. We empirically show that our
certification bounds are much tighter than state-of-the-art RL certification
solutions. We also run experiments on two popular c-MARL algorithms: QMIX and
VDN, in two different environments, with two and four agents. The experimental
results show that our method produces meaningful guaranteed robustness for all
models and environments. Our tool CertifyCMARL is available at
this https URL

提出了基于虚警率控制和基于树搜索的算法用于分析多智能体的鲁棒性。实验证明该方法产生的可靠性界限比现有模型更紧密。