GSR-BENCH: 通过多模态的LLM评估接地式空间推理的基准

Jun, 2024

GSR-BENCH: 通过多模态的LLM评估接地式空间推理的基准

GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs

Navid Rajabi, Jana Kosecka

TL;DR这篇研究报告通过扩展What'sUp数据集，提出了一个全面的评估方法用于空间关系理解，并对27种不同模型的性能进行了评估，其中包括早期的视觉语言模型（VLMs）和三类多模态语言模型（MLLMs），以验证其在任务中的表现和研究其规模的变化规律。

Abstract

The ability to understand and reason about spatial relationships between objects in images is an important component of visual reasoning. This skill rests on the ability to recognize and localize objects of inter