BriefGPT.xyz
Oct, 2023
使用一致性策略提升连续控制
Boosting Continuous Control with Consistency Policy
HTML
PDF
Yuhui Chen, Haoran Li, Dongbin Zhao
TL;DR
通过一步将噪声转化为动作,我们提出了一种名为CPQL的新型时间效率方法,解决了扩散模型在更新时的时间效率和准确性指导方面的问题,从而实现了脱机强化学习的策略改进,并可以无缝地扩展到在线强化学习任务中,最终实验结果表明,CPQL在11个脱机任务和21个在线任务中取得了新的最高性能,推理速度相比Diffusion-QL提高了近45倍。
Abstract
Due to its training stability and strong expression, the
diffusion model
has attracted considerable attention in
offline reinforcement learning
. However, several challenges have also come with it: 1) The demand f
→