BriefGPT.xyz
Oct, 2023
视觉语言模型中的问题:探究其在空间推理方面的挑战
What's "up" with vision-language models? Investigating their struggle with spatial reasoning
HTML
PDF
Amita Kamath, Jack Hessel, Kai-Wei Chang
TL;DR
通过创造新的语义理解基准数据集,研究表明近期的视觉-语言模型在识别基本空间关系方面表现较差,这是由于常用的数据集如VQAv2中缺乏关于学习空间关系的可靠数据来源。
Abstract
Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"? We curate three new corpora to quantify model
comprehension
of such basic
spatial relations
. These tests iso
→