Federated Reinforcement Learning (FRL) has been deemed as a promising
solution for intelligent decision-making in the era of Artificial Internet of
Things. However, existing FRL approaches often entail repeated interactions
with the environment during local updating, which can be prohibitively
expensive or even infeasible in many real-world domains. To overcome this
challenge, this paper proposes a novel offline federated policy optimization
algorithm, named $\texttt{DRPO}$, which enables distributed agents to
collaboratively learn a decision policy only from private and static data
without further environmental interactions. $\texttt{DRPO}$ leverages dual
regularization, incorporating both the local behavioral policy and the global
aggregated policy, to judiciously cope with the intrinsic two-tier
distributional shifts in offline FRL. Theoretical analysis characterizes the
impact of the dual regularization on performance, demonstrating that by
achieving the right balance thereof, $\texttt{DRPO}$ can effectively counteract
distributional shifts and ensure strict policy improvement in each federative
learning round. Extensive experiments validate the significant performance
gains of $\texttt{DRPO}$ over baseline methods.

提出了一种名为 DRPO 的离线联邦策略优化算法，通过使用双重正则化来解决离线联邦强化学习中的两级分布变化问题，实现了分布式智能决策的显著性能提升。