Recent empirical results have sparked a debate about whether or not Large
Language Models (LLMs) are capable of Theory of Mind (ToM). While some have
found LLMs to be successful on ToM evaluations such as the False Belief task
(Kosinski, 2023), others have argued that LLMs solve these tasks by exploiting
spurious correlations -- not representing beliefs -- since they fail on trivial
alterations to these tasks (Ullman, 2023). In this paper, we introduce SCALPEL:
a technique to generate targeted modifications for False Belief tasks to test
different specific hypotheses about why LLMs fail. We find that modifications
which make explicit common inferences -- such as that looking at a transparent
object implies recognizing its contents -- preserve LLMs' performance. This
suggests that LLMs' failures on modified ToM tasks could result from a lack of
more general commonsense reasoning, rather than a failure to represent mental
states. We argue that SCALPEL could be helpful for explaining LLM successes and
failures in other cases.

通过引入 SCALPEL 技术，我们发现使得明显的常规推理明确的修改（如透明物体的观察意味着识别其内容）能保持大型语言模型的性能，暗示大型语言模型在修改的心智任务上的失败可能是由于缺乏更一般的常识推理，而不是对心理状态的表示失败。我们认为 SCALPEL 可以对解释大型语言模型在其他情况下的成功和失败有所帮助。

用手术刀剖析 Ullman 变体：为何 LLMs 在对错误信念任务的微小改动中失败？

Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at  Trivial Alterations to the False Belief Task?

Humans can attribute mental states to others, a capacity known as Theory of
Mind. However, it is unknown to what extent this ability results from an innate
biological endowment or from experience accrued through child development,
particularly exposure to language describing others' mental states. We test the
viability of the language exposure hypothesis by assessing whether models
exposed to large quantities of human language develop evidence of Theory of
Mind. In pre-registered analyses, we present a linguistic version of the False
Belief Task, widely used to assess Theory of Mind, to both human participants
and a state-of-the-art Large Language Model, GPT-3. Both are sensitive to
others' beliefs, but while the language model significantly exceeds chance
behavior, it does not perform as well as the humans, nor does it explain the
full extent of their behavior -- despite being exposed to more language than a
human would in a lifetime. This suggests that while statistical learning from
language exposure may in part explain how humans develop Theory of Mind, other
mechanisms are also responsible.

通过评估大量语言暴露对理解心灵理论的影响，发现语言的统计学学习能够部分解释人类认知发展中心灵理论的发展，但是其他机制也起到重要作用，因为最先进的语言模型 GPT-3 尽管暴露于更多的语言环境下，但其表现并不能完全解释人类的行为