BriefGPT.xyz
Jun, 2021
离线强化学习问题的序列建模方法
Reinforcement Learning as One Big Sequence Modeling Problem
HTML
PDF
Michael Janner, Qiyang Li, Sergey Levine
TL;DR
本文介绍了如何使用序列建模来解决强化学习问题,使用Transformer架构来建模轨迹上的分布,并改造了波束搜索作为规划算法,在长时间序列预测、模仿学习、目标条件下的强化学习和离线强化学习等方面展示了该方法的灵活性和高效性,同时将该方法与基于模型的算法相结合,使其在稀疏奖励、长时间序列任务中表现为最先进的计划器。
Abstract
reinforcement learning
(RL) is typically concerned with estimating single-step policies or single-step models, leveraging the Markov property to factorize the problem in time. However, we can also view RL as a
sequence
→