We present a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

本文基于序列建模思路，提出一种将强化学习抽象为序列建模问题的框架，使用Transformer架构和相关的语言建模技术（如GPT-x和BERT）来解决增强学习任务，其中提出的Decision Transformer模型可以通过自回归模型来输出未来的动作并获得预期回报，其性能在Atari、OpenAI Gym和Key-to-Door等实验中达到了业界领先水平。

决策变压器：通过序列建模的强化学习