We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents
collaboratively learn a common policy without sharing their trajectory data. To
date, existing FRL work has primarily focused on agents operating in the same
or ``similar" environments. In contrast, our problem setup allows for
arbitrarily large levels of environment heterogeneity. To obtain the optimal
policy which maximizes the average performance across all potentially
completely different environments, we propose two algorithms: FedSVRPG-M and
FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M
and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge
to a stationary point of the average performance function, regardless of the
magnitude of environment heterogeneity. Furthermore, by incorporating the
benefits of variance-reduction techniques or Hessian approximation, both
algorithms achieve state-of-the-art convergence results, characterized by a
sample complexity of $\mathcal{O}\left(\epsilon^{-\frac{3}{2}}/N\right)$.
Notably, our algorithms enjoy linear convergence speedups with respect to the
number of agents, highlighting the benefit of collaboration among agents in
finding a common policy.

我们提出了两个算法：FedSVRPG-M 和 FedHAPG-M，通过利用动量机制，不论环境异质性的大小，两个算法都可以精确收敛到平均性能函数的一个稳定点，进一步结合方差降低技术或海森矩阵近似，两个算法均达到了最新的收敛结果，其采样复杂度为 O (epsilon^(-3/2)/N)，同时我们的算法线性加速了收敛速度，并突显了在找到共同策略中代理之间合作的好处。