Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2$\pi$, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2$\pi$ generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2$\pi$ produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps.

通过阅读策略学习器实现语言理解，是实现强化学习中泛化到新环境的有前途的方法。本文提出了一个基于文本阅读的策略学习问题，通过程序生成环境动态并相应地生成动态的语言描述。通过课程学习，我们的模型能够在需要几个推理和指代步骤的复杂任务上提供出色的策略。

通过阅读实现在新环境动态下的泛化