BriefGPT.xyz
Feb, 2024
通过轨迹拼接提炼离线强化学习的条件扩散模型
Distilling Conditional Diffusion Models for Offline Reinforcement Learning through Trajectory Stitching
HTML
PDF
Shangzhe Li, Xinhua Zhang
TL;DR
基于数据增强的知识蒸馏方法提出,通过条件扩散模型生成高回报轨迹,并通过新的奖励生成器运用新颖的拼接算法将其与原始轨迹混合。将得到的数据集应用于行为克隆,学习到的规模较小的浅层策略在几个D4RL基准测试中表现优于或接近深度生成规划器。
Abstract
deep generative models
have recently emerged as an effective approach to
offline reinforcement learning
. However, their large model size poses challenges in computation. We address this issue by proposing a
→