BriefGPT.xyz
Jul, 2023
并行Q学习:在大规模并行仿真下扩展离策略强化学习
Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation
HTML
PDF
Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
TL;DR
本研究介绍了一种并行的Q学习方案(PQL),通过并行化数据收集、策略学习和值学习,在墙钟训练时间上优于PPO算法,并保持了离策略学习的高样本效率。
Abstract
reinforcement learning
is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in
gpu-based simulation
, such as
→