Traditional distributed deep reinforcement learning (RL) commonly relies on
exchanging the experience replay memory (RM) of each agent. Since the RM
contains all state observations and action policy history, it may incur huge
communication overhead while violating the privacy of each agent.
Alternatively, this article presents a communication-efficient and
privacy-preserving distributed RL framework, coined federated reinforcement
distillation (FRD). In FRD, each agent exchanges its proxy experience replay
memory (ProxRM), in which policies are locally averaged with respect to proxy
states clustering actual states. To provide FRD design insights, we present
ablation studies on the impact of ProxRM structures, neural network
architectures, and communication intervals. Furthermore, we propose an improved
version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is
interpolated using the mixup data augmentation algorithm. Simulations in a
Cartpole environment validate the effectiveness of MixFRD in reducing the
variance of mission completion time and communication cost, compared to the
benchmark schemes, vanilla FRD, federated reinforcement learning (FRL), and
policy distillation (PD).

本文介绍了一种名为联邦强化蒸馏（FRD）的通信高效和隐私保护的分布式强化学习框架，并通过模拟实验验证了改进版本的 MixFRD 相对于基准方案具有更好的任务完成时间和通信成本方差减少。

代理经验回放：分布式强化学习的联合蒸馏

Proxy Experience Replay: Federated Distillation for Distributed  Reinforcement Learning

In distributed reinforcement learning, it is common to exchange the
experience memory of each agent and thereby collectively train their local
models. The experience memory, however, contains all the preceding state
observations and their corresponding policies of the host agent, which may
violate the privacy of the agent. To avoid this problem, in this work, we
propose a privacy-preserving distributed reinforcement learning (RL) framework,
termed federated reinforcement distillation (FRD). The key idea is to exchange
a proxy experience memory comprising a pre-arranged set of states and
time-averaged policies, thereby preserving the privacy of actual experiences.
Based on an advantage actor-critic RL architecture, we numerically evaluate the
effectiveness of FRD and investigate how the performance of FRD is affected by
the proxy memory structure and different memory exchanging rules.

提出了一种隐私保护的分布式强化学习框架 FRD，通过交换代理经验记忆保留真实经验的隐私，基于优势 actor-critic 强化学习架构评估了 FRD 的有效性，并研究了代理内存结构和不同内存交换规则对 FRD 性能的影响。