Text-based game environments are challenging because agents must deal with long sequences of text, execute compositional actions using text and learn from sparse rewards. We address these challenges by proposing Long-Context Language Decision Transformers (LLDTs), a framework that is based on long transformer language models and decision transformers (DTs). LLDTs extend DTs with 3 components: (1) exponential tilt to guide the agent towards high obtainable goals, (2) novel goal conditioning methods yielding significantly better results than the traditional return-to-go (sum of all future rewards), and (3) a model of future observations. Our ablation results show that predicting future observations improves agent performance. To the best of our knowledge, LLDTs are the first to address offline RL with DTs on these challenging games. Our experiments show that LLDTs achieve the highest scores among many different types of agents on some of the most challenging Jericho games, such as Enchanter.

通过提出一种基于长Transformer语言模型和决策Transformer的Long-Context Language Decision Transformers（LLDT）框架，成功应对了基于文本的游戏环境中智能体处理长序列文本、使用文本执行组合动作和从稀疏奖励中学习的挑战，并在传统奖励方法之外引入了三个组成部分，即指导代理朝着高可获得目标的指数倾斜、新颖的目标调节方法及对未来观测的模型，获得了在一些最具挑战性的 Jericho 游戏中比其他各种类型代理都要高的得分。

长文本语言决策Transformer和指数倾斜在交互式文本环境中的应用