BriefGPT.xyz
Feb, 2022
在线决策转换器
Online Decision Transformer
HTML
PDF
Qinqing Zheng, Amy Zhang, Aditya Grover
TL;DR
本文提出了基于序列建模的决策转换器(ODT)算法,该算法在离线预训练和在线调整中融合了序列级熵正则化和自回归建模目标,以实现高效的探索和调整。实验证明,在 D4RL 基准测试中,ODT 在绝对性能方面与最先进的方法具有竞争力,在微调过程中展现出更显著的提高。
Abstract
Recent work has shown that
offline reinforcement learning
(RL) can be formulated as a
sequence modeling
problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language m
→