BriefGPT.xyz
Oct, 2024
强化学习梯度提升在线微调决策变换器
Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
HTML
PDF
Kai Yan, Alexander G. Schwing, Yu-Xiong Wang
TL;DR
本研究针对决策变换器在线微调不足的问题进行了理论分析,指出传统的回报期望计算方法对微调过程的负面影响。通过实验证明,将TD3梯度加入在线决策变换器的微调过程显著提升了其在线微调性能,尤其是在低奖励离线数据预训练的情况下。这为进一步改善决策变换器提供了新的方向。
Abstract
Decision Transformers
have recently emerged as a new and compelling paradigm for offline
Reinforcement Learning
(RL), completing a trajectory in an autoregressive way. While improvements have been made to overcom
→