Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.

大型语言模型在理解否定时表现亚优，本研究通过引入一个大规模自动生成的常识知识数据集，涉及到约40万个描述性句子，其中大约2/3的句子包含否定形式，使用零样本学习方法对现有开源语言模型进行测试，结果表明尽管模型对于肯定的句子有较高准确性，但在否定句子方面存在困难，缺乏深入理解否定的能力。尽管在否定句子上对模型进行微调可以提高其性能，但在处理否定方面仍然存在泛化能力不足的问题，突显出大型语言模型在否定理解和泛化方面仍面临挑战。

这不是一个数据集：一个用于挑战大规模语言模型的大规模否定评估基准