This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6x greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https://github.com/bigrl-team/gear.

该研究介绍了一个名为GEAR的分布式、以GPU为中心的经验回放系统，旨在使用大型序列模型（如transformers）进行可扩展的强化学习。GEAR通过在GPU服务器上管理轨迹数据的内存资源来优化内存效率，并通过促进分散的GPU设备加速各种轨迹选择策略来绕过计算瓶颈。在使用最先进的大型强化学习模型进行训练时，集群实验表明，GEAR的性能水平最高可达到Reverb的6倍。GEAR在此https URL上开源。

GEAR: 一种面向大型强化学习模型的基于GPU的体验回放系统