This paper considers the problem of learning a control policy for robot
motion planning with zero-shot generalization, i.e., no data collection and
policy adaptation is needed when the learned policy is deployed in new
environments. We develop a federated reinforcement learning framework that
enables collaborative learning of multiple learners and a central server, i.e.,
the Cloud, without sharing their raw data. In each iteration, each learner
uploads its local control policy and the corresponding estimated normalized
arrival time to the Cloud, which then computes the global optimum among the
learners and broadcasts the optimal policy to the learners. Each learner then
selects between its local control policy and that from the Cloud for next
iteration. The proposed framework leverages on the derived zero-shot
generalization guarantees on arrival time and safety. Theoretical guarantees on
almost-sure convergence, almost consensus, Pareto improvement and optimality
gap are also provided. Monte Carlo simulation is conducted to evaluate the
proposed framework.

本文提出了一个零样本泛化的机器人运动规划学习控制策略的问题，在新环境中部署学习策略时不需要数据收集和策略调整；开发了一个联邦强化学习框架，可以实现多个学习者和中央服务器（云）协作学习，而无需共享原始数据；在每次迭代中，每个学习者上传本地控制策略和相应的估计归一化到达时间到云端，云端计算出多个学习者之间的全局最优解并广播最优策略给学习者；每个学习者从云端和本地控制策略中选择下一次迭代的策略；所提出的框架利用到达时间和安全性的零样本泛化保证。给出了几乎必定收敛、几乎共识、Pareto 改进和最优间隙的理论保证。通过蒙特卡洛仿真评估了提出的框架。