Text adventure games present unique challenges to reinforcement learning
methods due to their combinatorially large action spaces and sparse rewards.
The interplay of these two factors is particularly demanding because large
action spaces require extensive exploration, while sparse rewards provide
limited feedback. This work proposes to tackle the explore-vs-exploit dilemma
using a multi-stage approach that explicitly disentangles these two strategies
within each episode. Our algorithm, called eXploit-Then-eXplore (XTX), begins
each episode using an exploitation policy that imitates a set of promising
trajectories from the past, and then switches over to an exploration policy
aimed at discovering novel actions that lead to unseen state spaces. This
policy decomposition allows us to combine global decisions about which parts of
the game space to return to with curiosity-based local exploration in that
space, motivated by how a human may approach these games. Our method
significantly outperforms prior approaches by 27% and 11% average normalized
score over 12 games from the Jericho benchmark (Hausknecht et al., 2020) in
both deterministic and stochastic settings, respectively. On the game of Zork1,
in particular, XTX obtains a score of 103, more than a 2x improvement over
prior methods, and pushes past several known bottlenecks in the game that have
plagued previous state-of-the-art methods.

该研究针对文本冒险类游戏过大动作空间和奖励稀疏的问题，通过多阶段方法的策略分解，提出了 eXploit-Then-eXplore (XTX) 算法，在确定性和随机场景下要比先前优化方案提高 27％和 11％的平均标准化分数，在特定的 Zork1 游戏中，其得分高达 103 分，是之前最先进方法无法越过的瓶颈。

多阶段情节控制用于文本游戏中的战略性探索

Multi-Stage Episodic Control for Strategic Exploration in Text Games

Text adventure games, in which players must make sense of the world through
text descriptions and declare actions through text descriptions, provide a
stepping stone toward grounding action in language. Prior work has demonstrated
that using a knowledge graph as a state representation and question-answering
to pre-train a deep Q-network facilitates faster control policy transfer. In
this paper, we explore the use of knowledge graphs as a representation for
domain knowledge transfer for training text-adventure playing reinforcement
learning agents. Our methods are tested across multiple computer generated and
human authored games, varying in domain and complexity, and demonstrate that
our transfer learning methods let us learn a higher-quality control policy
faster.

该研究探讨了使用知识图谱作为域知识传输的表示来训练文本冒险游戏中的强化学习智能体的方法，测试其在多个游戏上的迁移学习能力，结果表明这种方法能够更快地学习高质量的控制策略。