We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super.

我们提出了一种新颖的多智能体强化学习方法，即选择性多智能体优先经验中继，在此方法中，智能体在训练过程中与其他智能体共享所观察到的有限数量的过渡现象。我们展示了该方法优于基准的非共享分散训练和最先进的多智能体强化学习算法。此外，仅共享少量高度相关的经验优于智能体之间的所有经验的共享，而选择性经验共享的性能提升在许多超参数和DQN变体范围内都是稳定的。我们的算法的参考实现可在此https URL获得。

有选择性地分享经验改善多智能体强化学习