Interactive fiction games have emerged as an important application to improve
the generalization capabilities of language-based reinforcement learning (RL)
agents. Existing environments for interactive fiction games are domain-specific
or time-consuming to generate and do not train the RL agents to master a
specific set of skills. In this work, we introduce an interactive environment
for self-supervised RL, STARLING, for text-based games that bootstraps the
text-based RL agents with automatically generated games (based on the seed set
of game ideas) to boost the performance and generalization capabilities to
reach a goal of the target environment. These games let the agent hone their
skills on a predefined set of tasks. We create and test an environment with 100
games, generated using this automated framework that uses large language models
(GPT-3) and an interactive fiction game engine (based on Inform7) to provide
the user with the ability to generate more games under minimal human
supervision. Experimental results based on both the human participants and
baseline text-based RL agents reveal that current state-of-the-art text-based
RL agents cannot use previously learned skills in new situations at the level
humans can. These results enforce STARLING's potential to serve as a sandbox
environment for further research in self-supervised text-based RL.

自动化游戏生成的 STARLING 环境为基于文本的强化学习代理提供了提升性能和泛化能力的能力，以通过与预定义任务集上的训练来提高代理的技能水平。

STARLING：基于大型语言模型的文本强化学习自监督训练代理

STARLING: Self-supervised Training of Text-based Reinforcement Learning  Agent with Large Language Models

In visual Reinforcement Learning (RL), upstream representation learning
largely determines the effect of downstream policy learning. Employing
auxiliary tasks allows the agent to enhance visual representation in a targeted
manner, thereby improving the sample efficiency and performance of downstream
RL. Prior advanced auxiliary tasks all focus on how to extract as much
information as possible from limited experience (including observations,
actions, and rewards) through their different auxiliary objectives, whereas in
this article, we first start from another perspective: auxiliary training data.
We try to improve auxiliary representation learning for RL by enriching
auxiliary training data, proposing \textbf{L}earning \textbf{F}uture
representation with \textbf{S}ynthetic observations \textbf{(LFS)}, a novel
self-supervised RL approach. Specifically, we propose a training-free method to
synthesize observations that may contain future information, as well as a data
selection approach to eliminate unqualified synthetic noise. The remaining
synthetic observations and real observations then serve as the auxiliary data
to achieve a clustering-based temporal association task for representation
learning. LFS allows the agent to access and learn observations that have not
yet appeared in advance, so as to quickly understand and exploit them when they
occur later. In addition, LFS does not rely on rewards or actions, which means
it has a wider scope of application (e.g., learning from video) than recent
advanced auxiliary tasks. Extensive experiments demonstrate that our LFS
exhibits state-of-the-art RL sample efficiency on challenging continuous
control and enables advanced visual pre-training based on action-free video
demonstrations.

通过丰富辅助训练数据，提出了一种无需训练的方法来合成可能包含未来信息的观察，从而改进了强化学习中的辅助表示学习，并展示了其在连续控制和基于无动作视频演示的视觉预训练中的先进性能。