Large Language Models (LLMs) have demonstrated superior performance in
language understanding benchmarks. CALM, a popular approach, leverages
linguistic priors of LLMs -- GPT-2 -- for action candidate recommendations to
improve the performance in text games in Jericho without environment-provided
actions. However, CALM adapts GPT-2 with annotated human gameplays and keeps
the LLM fixed during the learning of the text based games. In this work, we
explore and evaluate updating LLM used for candidate recommendation during the
learning of the text based game as well to mitigate the reliance on the human
annotated gameplays, which are costly to acquire. We observe that by updating
the LLM during learning using carefully selected in-game transitions, we can
reduce the dependency on using human annotated game plays for fine-tuning the
LLMs. We conducted further analysis to study the transferability of the updated
LLMs and observed that transferring in-game trained models to other games did
not result in a consistent transfer.

通过更新大型语言模型（LLMs）在基于文本游戏学习过程中的使用以减少对人类注释游戏的依赖性，提高 LLMs 的性能，并研究了从游戏中训练的模型到其他游戏的可迁移性。

语言模型中的循环学习：数据优化方法在文本游戏中进行推荐行为的学习

Language Model-In-The-Loop: Data Optimal Approach to Learn-To-Recommend  Actions in Text Games

Despite the recent success of large pretrained language models (LMs) on a
variety of prompting tasks, these models can be alarmingly brittle to small
changes in inputs or application contexts. To better understand such behavior
and motivate the design of more robust LMs, we propose a general experimental
framework, CALM (Competence-based Analysis of Language Models), where targeted
causal interventions are utilized to damage an LM's internal representation of
various linguistic properties in order to evaluate its use of each
representation in performing a given task. We implement these interventions as
gradient-based adversarial attacks, which (in contrast to prior causal probing
methodologies) are able to target arbitrarily-encoded representations of
relational properties, and carry out a case study of this approach to analyze
how BERT-like LMs use representations of several relational properties in
performing associated relation prompting tasks. We find that, while the
representations LMs leverage in performing each task are highly entangled, they
may be meaningfully interpreted in terms of the tasks where they are most
utilized; and more broadly, that CALM enables an expanded scope of inquiry in
LM analysis that may be useful in predicting and explaining weaknesses of
existing LMs.

本文提出了 CALM 实验框架并使用梯度基于对抗攻击的方法对语言模型的内部表示进行破坏性实验，以评估其在执行特定任务时使用每个表示的能力。在对 BERT 等 LM 执行对应关系提示任务的案例研究中，发现 LM 在执行每个任务时所利用的表示高度交织在一起，但可以在它们最常被利用的任务方面进行有意义的解释。