TL;DR提出了一种名为Spatial-Temporal Attention with Shapley(STAS)的新方法,该方法可以在时间和空间维度上学习信用分配,在多智能体强化学习中实现有效的空间 - 时间信用分配,优于所有现有的基线。
Abstract
Centralized Training with Decentralized Execution (CTDE) has been proven to be an effective paradigm in cooperative multi-agent reinforcement learning (MARL). One of the major challenges is yet credit assignment,