Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments. It allows individuals to draw conclusions that logically follow, given the truth value of the information provided. Recent progress in the domain of large language models (LLMs) has showcased their capability in executing deductive reasoning tasks. Nonetheless, a significant portion of research primarily assesses the accuracy of LLMs in solving such tasks, often overlooking a deeper analysis of their reasoning behavior. In this study, we draw upon principles from cognitive psychology to examine inferential strategies employed by LLMs, through a detailed evaluation of their responses to propositional logic problems. Our findings indicate that LLMs display reasoning patterns akin to those observed in humans, including strategies like $\textit{supposition following}$ or $\textit{chain construction}$. Moreover, our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning, with more advanced models tending to adopt strategies more frequently than less sophisticated ones. Importantly, we assert that a model's accuracy, that is the correctness of its final conclusion, does not necessarily reflect the validity of its reasoning process. This distinction underscores the necessity for more nuanced evaluation procedures in the field.

该研究通过对大型语言模型在命题逻辑问题上的响应进行细致评估，利用认知心理学原理探讨了模型使用的推理策略。结果发现，大型语言模型展示出类似于人类的推理模式，包括“解释跟踪”和“链式构建”等策略。此外，该研究表明模型的架构和规模显著影响其首选的推理方法，较先进的模型更倾向于频繁使用这些策略。模型的准确性并不必然反映其推理过程的有效性，这一区别强调了该领域需要更为精细的评估程序。

人类和大型语言模型在演绎推理中的推理策略比较