Humans can readily judge the number of objects in a visual scene, even without counting, and such a skill has been documented in a variety of animal species and in babies prior to language development and formal schooling. Numerical judgments are error-free for small sets, while for larger collections responses become approximate, with variability increasing proportionally to the target number. This response pattern is observed for items of all kinds, despite variation in object features (such as color or shape), suggesting that our visual number sense relies on abstract representations of numerosity. Here, we investigated whether generative Artificial Intelligence (AI) models based on large-scale transformer architectures can reliably name the number of objects in simple visual stimuli or generate images containing a target number of items in the 1-10 range. Surprisingly, none of the foundation models considered performed in a human-like way: They all made striking errors even with small numbers, the response variability often did not increase in a systematic way, and the pattern of errors varied with object category. Our findings demonstrate that advanced AI systems still lack a basic ability that supports an intuitive understanding of numbers, which in humans is foundational for numeracy and mathematical development.

通过使用大规模Transformer架构的生成式人工智能模型，本研究调查了是否能可靠地命名简单视觉刺激中的物体数量或生成包含1-10范围内目标数量物体的图像。令人惊讶的是，所有考虑的基础模型都没有以人类的方式表现出来：即使在小数量下也会出现明显错误，响应的变异性通常没有按系统方式增加，错误的模式也随物体类别而变化。我们的研究结果表明，高级人工智能系统仍然缺乏支持直观理解数字的基本能力，这对人类的数字能力和数学发展至关重要。

大规模生成式人工智能模型缺乏视觉数值感知能力