BriefGPT.xyz
Oct, 2023
通过扩散行为对得分正则化策略优化
Score Regularized Policy Optimization through Diffusion Behavior
HTML
PDF
Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
TL;DR
我们提出了一种从评论家模型和预训练的扩散行为模型中有效地提取确定性推理策略的方法,利用后者在优化过程中直接规范化行为分布的评分函数,从而在训练和评估期间完全避免计算密集型和耗时的扩散采样方案,扩散建模的强大生成能力使我们的方法在D4RL任务上将行动采样速度提高了25倍以上,同时仍保持着最先进的性能。
Abstract
Recent developments in
offline reinforcement learning
have uncovered the immense potential of
diffusion modeling
, which excels at representing heterogeneous behavior policies. However, sampling from diffusion pol
→