In principle, reinforcement learning and policy search methods can enable
robots to learn highly complex and general skills that may allow them to
function amid the complexity and diversity of the real world. However, training
a policy that generalizes well across a wide range of real-world conditions
requires far greater quantity and diversity of experience than is practical to
collect with a single robot. Fortunately, it is possible for multiple robots to
share their experience with one another, and thereby, learn a policy
collectively. In this work, we explore distributed and asynchronous policy
learning as a means to achieve generalization and improved training times on
challenging, real-world manipulation tasks. We propose a distributed and
asynchronous version of Guided Policy Search and use it to demonstrate
collective policy learning on a vision-based door opening task using four
robots. We show that it achieves better generalization, utilization, and
training times than the single robot alternative.

本文探讨了分布式异步策略学习作为实现机器人具备普适性和提高复杂任务训练效率的手段。实验证明，使用这种方法可以提高机器人对任务的泛化、利用和训练时间效率，从而在视觉门开启任务中取得更好的效果。

集体机器人分布式异步引导策略搜索强化学习

Collective Robot Reinforcement Learning with Distributed Asynchronous  Guided Policy Search

Cooperative games are those in which both agents share the same payoff
structure. Value-based reinforcement-learning algorithms, such as variants of
Q-learning, have been applied to learning cooperative games, but they only
apply when the game state is completely observable to both agents. Policy
search methods are a reasonable alternative to value-based methods for
partially observable environments. In this paper, we provide a gradient-based
distributed policy-search method for cooperative games and compare the notion
of local optimum to that of Nash equilibrium. We demonstrate the effectiveness
of this method experimentally in a small, partially observable simulated soccer
domain.

本文提出了一种基于梯度的分布式策略搜索方法，用于合作博弈中的部分可观测环境，比较了本地最优和纳什均衡的概念，实验结果表明该方法的有效性。