We expose the danger of reward poisoning in offline multi-agent reinforcement learning (MARL), whereby an attacker can modify the reward vectors to different learners in an offline data set while incurring a poisoning cost. Based on the poisoned data set, all rational learners using some confidence-bound-based MARL algorithm will infer that a target policy - chosen by the attacker and not necessarily a solution concept originally - is the Markov perfect dominant strategy equilibrium for the underlying Markov Game, hence they will adopt this potentially damaging target policy in the future. We characterize the exact conditions under which the attacker can install a target policy. We further show how the attacker can formulate a linear program to minimize its poisoning cost. Our work shows the need for robust MARL against adversarial attacks.

本研究探讨了多智能体强化学习中的奖励毒化攻击，并展示了攻击者可以安装目标策略作为马尔科夫完美主导策略均衡，从而使得理性代理人会跟随攻击者所预期的策略走向。该攻击可以更便捷地实施，也适用于多种不同结构的数据集和MARL代理算法，我们同时还研究了数据集结构和攻击代价之间的关系与防御方法。

离线多智能体强化学习中的奖励污染攻击