Recent scholarship on reasoning in LLMs has supplied evidence of impressive performance and flexible adaptation to machine generated or human feedback. Nonmonotonic reasoning, crucial to human cognition for navigating the real world, remains a challenging, yet understudied task. In this work, we study nonmonotonic reasoning capabilities of seven state-of-the-art LLMs in one abstract and one commonsense reasoning task featuring generics, such as 'Birds fly', and exceptions, 'Penguins don't fly' (see Fig. 1). While LLMs exhibit reasoning patterns in accordance with human nonmonotonic reasoning abilities, they fail to maintain stable beliefs on truth conditions of generics at the addition of supporting examples ('Owls fly') or unrelated information ('Lions have manes'). Our findings highlight pitfalls in attributing human reasoning behaviours to LLMs, as well as assessing general capabilities, while consistent reasoning remains elusive.

最近的关于LLMs推理的学术研究提供了令人印象深刻的表现和对机器生成或人类反馈的灵活适应的证据。非单调推理对于人类认知来说至关重要，用于在现实世界中进行导航，但仍然是一个具有挑战性但研究不足的任务。我们研究了七种最先进的LLMs在一个抽象推理任务和一个常识推理任务中的非单调推理能力，这两个任务都涉及到如“鸟会飞”和“企鹅不会飞”等概括性陈述以及其例外情况。虽然LLMs表现出与人类非单调推理能力相符合的推理模式，但在支持性例子（“猫头鹰会飞”）或不相关信息（“狮子有鬃毛”）的添加时，它们无法保持对概括陈述的真实性条件的稳定信念。我们的研究结果突显了将人类推理行为归因于LLMs以及评估其总体能力的隐患，而一致的推理仍然难以实现。

LLM是古典还是非单调推理者？从通性中得到的教训