In the present study, we investigate and compare reasoning in large language models (LLM) and humans using a selection of cognitive psychology tools traditionally dedicated to the study of (bounded) rationality. To do so, we presented to human participants and an array of pretrained LLMs new variants of classical cognitive experiments, and cross-compared their performances. Our results showed that most of the included models presented reasoning errors akin to those frequently ascribed to error-prone, heuristic-based human reasoning. Notwithstanding this superficial similarity, an in-depth comparison between humans and LLMs indicated important differences with human-like reasoning, with models limitations disappearing almost entirely in more recent LLMs releases. Moreover, we show that while it is possible to devise strategies to induce better performance, humans and machines are not equally-responsive to the same prompting schemes. We conclude by discussing the epistemological implications and challenges of comparing human and machine behavior for both artificial intelligence and cognitive psychology.

通过对大型语言模型（LLM）和人类的推理进行比较，本研究使用传统的认知心理学工具调查和比较它们的表现，结果显示大部分模型呈现了类似于人类具有错误倾向、启发式推理的推理错误，然而，深入比较发现最近的LLM版本在与人类推理的区别方面存在重要差异且模型的局限性在新版LLM中几乎完全消失，此外，我们还表明，虽然有可能设计策略以提高模型的性能，但人类和机器对相同的提示方案的响应并不相同，最后我们讨论了比较人类和机器行为在人工智能和认知心理学领域中的认识论意义和挑战。

研究和改进人类和机器的推理能力