Text-based reinforcement learning involves an agent interacting with a
fictional environment using observed text and admissible actions in natural
language to complete a task. Previous works have shown that agents can succeed
in text-based interactive environments even in the complete absence of semantic
understanding or other linguistic capabilities. The success of these agents in
playing such games suggests that semantic understanding may not be important
for the task. This raises an important question about the benefits of LMs in
guiding the agents through the game states. In this work, we show that rich
semantic understanding leads to efficient training of text-based RL agents.
Moreover, we describe the occurrence of semantic degeneration as a consequence
of inappropriate fine-tuning of language models in text-based reinforcement
learning (TBRL). Specifically, we describe the shift in the semantic
representation of words in the LM, as well as how it affects the performance of
the agent in tasks that are semantically similar to the training games. We
believe these results may help develop better strategies to fine-tune agents in
text-based RL scenarios.

本文研究了基于文本的强化学习，讨论了语义理解和语言能力对强化学习代理的训练效率以及在类似语义训练游戏中的表现的影响，旨在为文本强化学习情景下的代理微调开发更好的策略。

关于对文本驱动强化学习应用微调语言模型效果的研究

On the Effects of Fine-tuning Language Models for Text-Based  Reinforcement Learning

Text-based reinforcement learning agents have predominantly been neural
network-based models with embeddings-based representation, learning
uninterpretable policies that often do not generalize well to unseen games. On
the other hand, neuro-symbolic methods, specifically those that leverage an
intermediate formal representation, are gaining significant attention in
language understanding tasks. This is because of their advantages ranging from
inherent interpretability, the lesser requirement of training data, and being
generalizable in scenarios with unseen data. Therefore, in this paper, we
propose a modular, NEuro-Symbolic Textual Agent (NESTA) that combines a generic
semantic parser with a rule induction system to learn abstract interpretable
rules as policies. Our experiments on established text-based game benchmarks
show that the proposed NESTA method outperforms deep reinforcement
learning-based techniques by achieving better generalization to unseen test
games and learning from fewer training interactions.

通过结合语义解析器和规则归纳系统，我们提出了一个模块化的 NEuro-Symbolic Textual Agent（NESTA），该模型能够学习抽象可解释的规则作为策略，并在文本游戏基准测试中表现出更好的泛化能力和更少的训练交互。