Large language models (LLMs) are being applied as actors for sequential
decision making tasks in domains such as robotics and games, utilizing their
general world knowledge and planning abilities. However, previous work does
little to explore what environment state information is provided to LLM actors
via language. Exhaustively describing high-dimensional states can impair
performance and raise inference costs for LLM actors. Previous LLM actors avoid
the issue by relying on hand-engineered, task-specific protocols to determine
which features to communicate about a state and which to leave out. In this
work, we propose Brief Language INputs for DEcision-making Responses (BLINDER),
a method for automatically selecting concise state descriptions by learning a
value function for task-conditioned state descriptions. We evaluate BLINDER on
the challenging video game NetHack and a robotic manipulation task. Our method
improves task success rate, reduces input size and compute costs, and
generalizes between LLM actors.

利用大规模语言模型（LLM）作为序贯决策制定任务的参与者，在机器人和游戏等领域应用它们的普适世界知识和规划能力；在此文献中，我们提出了一种称为 BLINDER 的方法，用于通过学习任务条件化状态描述的价值函数自动选择简明的状态描述，在 NetHack（一种具有挑战性的视频游戏）和机器人操纵任务上评估 BLINDER，我们的方法提高了任务成功率，减少了输入尺寸和计算成本，且能够在大规模语言模型参与者之间泛化。

选择感知：用增强学习优化语言模型演员的状态描述

Selective Perception: Optimizing State Descriptions with Reinforcement  Learning for Language Model Actors

Large language models (LLMs) struggle on processing complicated observations
in interactive decision making. To alleviate this issue, we propose a simple
hierarchical prompting approach. Diverging from previous prompting approaches
that always put the \emph{full} observation~(\eg a web page) to the prompt, we
propose to first construct an action-aware observation which is more
\emph{condensed} and \emph{relevant} with a dedicated \summ prompt. The \actor
prompt then predicts the next action based on the summarized history. While our
method has broad applicability, we particularly demonstrate its efficacy in the
complex domain of web navigation where a full observation often contains
redundant and irrelevant information. Our approach outperforms the previous
state-of-the-art prompting mechanism with the same LLM by 6.2\% on task success
rate, demonstrating its potential on interactive decision making tasks with
long observation traces.

通过一种分层提示方法，对互动决策中复杂的观察进行处理，特别在 Web 导航的复杂领域中，该方法的任务成功率比最先进的提示机制提高了 6.2％，展示了它对具有长观察跟踪的交互决策任务的潜力。

分层引导辅助大型语言模型进行网络导航

Hierarchical Prompting Assists Large Language Model on Web Navigation

Multi-action dialog policy (MADP), which generates multiple atomic dialog
actions per turn, has been widely applied in task-oriented dialog systems to
provide expressive and efficient system responses. Existing MADP models usually
imitate action combinations from the labeled multi-action dialog samples. Due
to data limitations, they generalize poorly toward unseen dialog flows. While
interactive learning and reinforcement learning algorithms can be applied to
incorporate external data sources of real users and user simulators, they take
significant manual effort to build and suffer from instability. To address
these issues, we propose Planning Enhanced Dialog Policy (PEDP), a novel
multi-task learning framework that learns single-action dialog dynamics to
enhance multi-action prediction. Our PEDP method employs model-based planning
for conceiving what to express before deciding the current response through
simulating single-action dialogs. Experimental results on the MultiWOZ dataset
demonstrate that our fully supervised learning-based method achieves a solid
task success rate of 90.6%, improving 3% compared to the state-of-the-art
methods.

本文提出了一种基于多任务学习框架的 Planning Enhanced Dialog Policy (PEDP) 方法，使用模型规划来模拟单动作对话，从而增强多动作预测，实现了相对于现有状态下最先进方法的 3% 提高，达到了 90.6% 的可靠任务成功率。