Distributed Reinforcement Learning (RL) frameworks are essential for mapping RL workloads to multiple computational resources, allowing for faster generation of samples, estimation of values, and policy improvement. These computational paradigms require a seamless integration of training, serving, and simulation workloads. Existing frameworks, such as Ray, are not managing this orchestration efficiently. In this study, we've proposed a solution implementing Reactor Model, which enforces a set of actors to have a fixed communication pattern. This allows the scheduler to eliminate works needed for synchronization, such as acquiring and releasing locks for each actor or sending and processing coordination-related messages. Our framework, Lingua Franca (LF), a coordination language based on the Reactor Model, also provides a unified interface that allows users to automatically generate dataflow graphs for distributed RL. On average, LF outperformed Ray in generating samples from OpenAI Gym and Atari environments by 1.21x and 11.62x, reduced the average training time of synchronized parallel Q-learning by 31.2%, and accelerated Multi-Agent RL inference by 5.12x.

提出了一种基于反应器模型的解决方案，用于分布式强化学习框架，该框架通过强制性的一组actor具有固定的通信模式来优化RL工作负载的映射和协调，提供了一个统一的接口，从OpenAI Gym和Atari环境中生成样本比Ray平均高出1.21倍和11.62倍，将同步并行Q学习的平均训练时间缩短31.2％，并将多Agent RL推断加速了5.12倍。

优化分布式强化学习的反应堆模型与共通语