BriefGPT.xyz
Jun, 2024
GSR-BENCH: 通过多模态的LLM评估接地式空间推理的基准
GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
HTML
PDF
Navid Rajabi, Jana Kosecka
TL;DR
这篇研究报告通过扩展What'sUp数据集,提出了一个全面的评估方法用于空间关系理解,并对27种不同模型的性能进行了评估,其中包括早期的视觉语言模型(VLMs)和三类多模态语言模型(MLLMs),以验证其在任务中的表现和研究其规模的变化规律。
Abstract
The ability to understand and reason about
spatial relationships
between objects in images is an important component of
visual reasoning
. This skill rests on the ability to recognize and localize objects of inter
→