BriefGPT.xyz
Apr, 2022
视觉空间推理
Visual Spatial Reasoning
HTML
PDF
Fangyu Liu, Guy Emerson, Nigel Collier
TL;DR
本研究提出Visual Spatial Reasoning(VSR)数据集,这是包含超过10k已标注的英文自然文本图像对和66种空间关系的数据集,研究表明当前视觉语言模型只能达到约70%的准确率,无法识别有关物体朝向的关系。
Abstract
spatial relations
are fundamental to human cognition and are the most basic knowledge for us to understand and communicate about our physical surroundings. In this paper, we ask the critical question: Are current
vision
→