BriefGPT.xyz
May, 2024
离线强化学习中创建信任区域的扩散策略
Diffusion Policies creating a Trust Region for Offline Reinforcement Learning
HTML
PDF
Tianyu Chen, Zhendong Wang, Mingyuan Zhou
TL;DR
离线强化学习中的扩散信任 Q 学习方法(DTQL)通过引入扩散模型作为一个强大和有表达力的策略类,消除了训练和推理过程中迭代去噪采样的需要,大大提高了计算效率,并在多个基准任务中展现了优越的性能和算法特性。
Abstract
offline reinforcement learning
(RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing
diffusion models
as a powerful and expressive policy class, significantly bo
→