Large Language Models have shown exceptional generative abilities in various
natural language and generation tasks. However, possible anthropomorphization
and leniency towards failure cases have propelled discussions on emergent
abilities of Large Language Models especially on Theory of Mind (ToM) abilities
in Large Language Models. While several false-belief tests exists to verify the
ability to infer and maintain mental models of another entity, we study a
special application of ToM abilities that has higher stakes and possibly
irreversible consequences : Human Robot Interaction. In this work, we explore
the task of Perceived Behavior Recognition, where a robot employs a Large
Language Model (LLM) to assess the robot's generated behavior in a manner
similar to human observer. We focus on four behavior types, namely -
explicable, legible, predictable, and obfuscatory behavior which have been
extensively used to synthesize interpretable robot behaviors. The LLMs goal is,
therefore to be a human proxy to the agent, and to answer how a certain agent
behavior would be perceived by the human in the loop, for example "Given a
robot's behavior X, would the human observer find it explicable?". We conduct a
human subject study to verify that the users are able to correctly answer such
a question in the curated situations (robot setting and plan) across five
domains. A first analysis of the belief test yields extremely positive results
inflating ones expectations of LLMs possessing ToM abilities. We then propose
and perform a suite of perturbation tests which breaks this illusion, i.e.
Inconsistent Belief, Uninformative Context and Conviction Test. We conclude
that, the high score of LLMs on vanilla prompts showcases its potential use in
HRI settings, however to possess ToM demands invariance to trivial or
irrelevant perturbations in the context which LLMs lack.

通过研究大型语言模型在人机交互中的应用，本文探讨了理解机器生成行为的能力，特别是在承认他人心理状态方面，发现大型语言模型缺乏对无关紧要或微小变化的不变性。

人机交互中大型语言模型对心理理论的能力：一种幻象？

Theory of Mind abilities of Large Language Models in Human-Robot  Interaction : An Illusion?

Preference Based Reinforcement Learning has shown much promise for utilizing
human binary feedback on queried trajectory pairs to recover the underlying
reward model of the Human in the Loop (HiL). While works have attempted to
better utilize the queries made to the human, in this work we make two
observations about the unlabeled trajectories collected by the agent and
propose two corresponding loss functions that ensure participation of unlabeled
trajectories in the reward learning process, and structure the embedding space
of the reward model such that it reflects the structure of state space with
respect to action distances. We validate the proposed method on one locomotion
domain and one robotic manipulation task and compare with the state-of-the-art
baseline PEBBLE. We further present an ablation of the proposed loss components
across both the domains and find that not only each of the loss components
perform better than the baseline, but the synergic combination of the two has
much better reward recovery and human feedback sample efficiency.

本文提出了两个损失函数，利用未标记的轨迹集参与奖励学习过程，并结构化奖励模型的嵌入空间以反映状态空间与操作距离之间的结构，旨在提高样本效率和奖励恢复能力，该方法在基于机械臂操作的领域上比当前的最优算法 PEBBLE 表现更好。

利用未标记的数据进行高效反馈的基于人类偏好的强化学习

Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning

Telephone transcription data can be very noisy due to speech recognition
errors, disfluencies, etc. Not only that annotating such data is very
challenging for the annotators, but also such data may have lots of annotation
errors even after the annotation job is completed, resulting in a very poor
model performance. In this paper, we present an active learning framework that
leverages human in the loop learning to identify data samples from the
annotated dataset for re-annotation that are more likely to contain annotation
errors. In this way, we largely reduce the need for data re-annotation for the
whole dataset. We conduct extensive experiments with our proposed approach for
Named Entity Recognition and observe that by re-annotating only about 6%
training instances out of the whole dataset, the F1 score for a certain entity
type can be significantly improved by about 25%.

本文介绍了一种采用人类辅助学习的主动学习框架，以识别更有可能包含注释错误的数据样本进行重新注释，从而显著提高特定实体类型的 F1 分数。该方法只需对整个数据集的约 6% 的训练实例进行重新注释即可取得良好效果。