We apply diffusion strategies to develop a fully-distributed cooperative
reinforcement learning algorithm in which agents in a network communicate only
with their immediate neighbors to improve predictions about their environment.
The algorithm can also be applied to off-policy learning, meaning that the
agents can predict the response to a behavior different from the actual
policies they are following. The proposed distributed strategy is efficient,
with linear complexity in both computation time and memory footprint. We
provide a mean-square-error performance analysis and establish convergence
under constant step-size updates, which endow the network with continuous
learning capabilities. The results show a clear gain from cooperation: when the
individual agents can estimate the solution, cooperation increases stability
and reduces bias and variance of the prediction error; but, more importantly,
the network is able to approach the optimal solution even when none of the
individual agents can (e.g., when the individual behavior policies restrict
each agent to sample a small portion of the state space).

采用扩散策略，将全分布式协作强化学习算法应用于分布式网络，实现仅与直接相邻的智能体通信以改进他们对环境的预测能力，具有线性计算时间和内存占用的高效分布式策略，可应用于离线学习和连续学习，以减少预测误差的偏差和方差，实现全局最优解的学习。