BriefGPT.xyz
Sep, 2024
用于离线强化学习的Q值正则化决策卷积变换器
Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning
HTML
PDF
Teng Yan, Zhendong Ruan, Yaobang Cai, Yu Han, Wenxian Li...
TL;DR
本研究针对离线强化学习中决策变换器方法存在的采样回报不一致问题,提出了一种新颖的Q值正则化决策卷积变换器(QDC)。通过结合动态规划方法最大化动作值,QDC确保采样动作的期望回报与最优回报一致,从而在D4RL基准测试中表现出色,特别在轨迹拼接能力上显示出卓越的竞争力。
Abstract
As a data-driven paradigm,
Offline Reinforcement Learning
(Offline RL) has been formulated as sequence modeling, where the
Decision Transformer
(DT) has demonstrated exceptional capabilities. Unlike previous rein
→