The development of highly fluent large language models (LLMs) has prompted increased interest in assessing their reasoning and problem-solving capabilities. We investigate whether several LLMs can solve a classic type of deductive reasoning problem from the cognitive science literature. The tested LLMs have limited abilities to solve these problems in their conventional form. We performed follow up experiments to investigate if changes to the presentation format and content improve model performance. We do find performance differences between conditions; however, they do not improve overall performance. Moreover, we find that performance interacts with presentation format and content in unexpected ways that differ from human performance. Overall, our results suggest that LLMs have unique reasoning biases that are only partially predicted from human reasoning performance.

大型语言模型（LLMs）的发展促使人们对其推理和问题解决能力产生了更大的兴趣。本研究调查了几种LLMs是否能够解决认知科学文献中一种经典类型的演绎推理问题。研究发现，这些被测试的LLMs在传统形式上解决这些问题的能力有限。我们进行了后续实验，探究了更改展示格式和内容是否能改善模型性能。尽管我们发现了条件之间的绩效差异，但总体性能并未提高。此外，我们还发现性能与展示格式和内容以出人意料的方式相互作用，与人类表现有所不同。总的来说，我们的结果表明LLMs具有独特的推理偏见，其只能部分预测人类的推理表现。

评估大型语言模型的推理能力