Do neural network models of vision learn brain-aligned representations because they share architectural constraints and task objectives with biological vision or because they learn universal features of natural image processing? We characterized the universality of hundreds of thousands of representational dimensions from visual neural networks with varied construction. We found that networks with varied architectures and task objectives learn to represent natural images using a shared set of latent dimensions, despite appearing highly distinct at a surface level. Next, by comparing these networks with human brain representations measured with fMRI, we found that the most brain-aligned representations in neural networks are those that are universal and independent of a network's specific characteristics. Remarkably, each network can be reduced to fewer than ten of its most universal dimensions with little impact on its representational similarity to the human brain. These results suggest that the underlying similarities between artificial and biological vision are primarily governed by a core set of universal image representations that are convergently learned by diverse systems.

本研究探讨了神经网络视觉模型是否因与生物视觉共享建筑约束和任务目标而学习与大脑对齐的表征，还是因学习自然图像处理的普遍特征。研究发现，不同架构和任务目标的网络学习使用一组共同的潜在维度来表征自然图像，这表明人工与生物视觉的相似性主要由一组核心的普遍图像表征所主导。

视觉表征的普遍维度