Reinforcement learning via sequence modeling has shown remarkable promise in autonomous systems, harnessing the power of offline datasets to make informed decisions in simulated environments. However, the full potential of such methods in complex dynamic environments remain to be discovered. In autonomous driving domain, learning-based agents face significant challenges when transferring knowledge from simulated to real-world settings and the performance is also significantly impacted by data distribution shift. To address these issue, we propose Sample-efficient Imitative Multi-token Decision Transformer (SimDT). SimDT introduces multi-token prediction, imitative online learning and prioritized experience replay to Decision Transformer. The performance is evaluated through empirical experiments and results exceed popular imitation and reinforcement learning algorithms on Waymax benchmark.

使用序列建模进行的强化学习在自主系统中显示出巨大的潜力，利用离线数据集来在模拟环境中做出明智的决策。然而，在复杂的动态环境中，此类方法的全部潜力尚待发现。为了解决这些问题，我们提出了一种名为 Sample-efficient Imitative Multi-token Decision Transformer (SimDT) 的样本高效的模仿式多令牌决策Transformer，通过实证实验进行性能评估，并在 Waymax 基准测试上超过了流行的模仿和强化学习算法。

用于通用现实世界驾驶的高效模仿多令牌决策Transformer