An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang...
TL;DR通过数据处理和深度信息集成,Spatial Region GPT(SpatialRGPT)提升了Vision Language Models(VLMs)的空间感知和推理能力,并且在空间推理任务中显著提高了性能。
Abstract
vision language models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce spatial reg