The "Reversal Curse" refers to the scenario where auto-regressive decoder large language models (LLMs), such as ChatGPT, trained on "A is B" fail to learn "B is A", demonstrating a basic failure of logical deduction. This raises a red flag in the use of GPT models for certain general tasks such as constructing knowledge graphs, considering their adherence to this symmetric principle. In our study, we examined a bidirectional LLM, BERT, and found that it is immune to the reversal curse. Driven by ongoing efforts to construct biomedical knowledge graphs with LLMs, we also embarked on evaluating more complex but essential deductive reasoning capabilities. This process included first training encoder and decoder language models to master the intersection ($\cap$) and union ($\cup$) operations on two sets and then moving on to assess their capability to infer different combinations of union ($\cup$) and intersection ($\cap$) operations on three newly created sets. The findings showed that while both encoder and decoder language models, trained for tasks involving two sets (union/intersection), were proficient in such scenarios, they encountered difficulties when dealing with operations that included three sets (various combinations of union and intersection). Our research highlights the distinct characteristics of encoder and decoder models in simple and complex logical reasoning. In practice, the choice between BERT and GPT should be guided by the specific requirements and nature of the task at hand, leveraging their respective strengths in bidirectional context comprehension and sequence prediction.

在这项研究中，我们探讨了大规模语言模型在逻辑推理方面的局限性，发现ChatGPT等自回归解码器训练模型在'A是B'的任务中往往无法学习到'B是A'，从而揭示了它们在逻辑演绎上的失败。我们的研究不仅对双向语言模型BERT进行了评估，发现其对逆转诅咒具有免疫能力，还探索了医学知识图谱构建中的复杂演绎推理能力。尽管在处理两个集合（并集/交集）的情况下编码器和解码器模型表现良好，但在涉及三个集合的操作（并集、交集的各种组合）时，它们遇到了困难。因此，选择BERT和GPT模型应该根据任务的具体要求和性质，充分利用它们在双向上下文理解和序列预测方面的优势。

不是所有大型语言模型（LLM）都屈服于“逆转诅咒”：BERT和GPT模型推理能力的比较研究