Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with a task requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables the concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance, and illustrate how different communication patterns can emerge for different tasks.

本文提出了一个基于深度确定性策略梯度的多智能体训练框架，利用存储设备并发端到端学习明确的通信协议，来提高小规模系统中智能体的协作和性能，同时研究了不同通信模式对性能的影响。

通过基于记忆的通信提高小规模多智体深度强化学习中的协调