We explore a collaborative multi-agent reinforcement learning setting where a team of agents attempts to solve cooperative tasks in partially-observable environments. In this scenario, learning an effective communication protocol is key. We propose a communication architecture that allows for targeted communication, where agents learn both what messages to send and who to send them to, solely from downstream task-specific reward without any communication supervision. Additionally, we introduce a multi-stage communication approach where the agents co-ordinate via multiple rounds of communication before taking actions in the environment. We evaluate our approach on a diverse set of cooperative multi-agent tasks, of varying difficulties, with varying number of agents, in a variety of environments ranging from 2D grid layouts of shapes and simulated traffic junctions to complex 3D indoor environments. We demonstrate the benefits of targeted as well as multi-stage communication. Moreover, we show that the targeted communication strategies learned by agents are both interpretable and intuitive.

本文提出了一种针对多智能体强化学习的有针对性通信架构，智能体在部分可见环境中执行协作任务时学习如何发送信息和将其发送给谁。该方法在没有通信监督的情况下，仅通过下游任务特定的奖励来学习定向行为。此外，我们通过多轮通信方法增强智能体之间的协调，以更好地适应不断变化的环境。我们在各种环境和任务中的测试结果证明了有针对性和多轮通信的优势，并且所学的定向通信策略可解释性和直观性。最后，我们表明我们的架构可以轻松扩展到混合和竞争环境中，从而提高性能和样本复杂性。

TarMAC：有目标的多智能体通信