BriefGPT.xyz
Jun, 2024
TopViewRS: 视觉-语言模型作为俯视空间推理器
TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
HTML
PDF
Chengzu Li, Caiqi Zhang, Han Zhou, Nigel Collier, Anna Korhonen...
TL;DR
基于TopViewRS数据集,评估了代表性的开源和闭源可见语言模型在不同复杂度的感知和推理任务上的表现,并发现其性能明显低于人类平均水平,强调了提升模型在地理空间推理方面的能力的迫切需求,并为进一步研究出路提供了基础。
Abstract
top-view perspective
denotes a typical way in which humans read and reason over different types of maps, and it is vital for localization and navigation of humans as well as of `non-human' agents, such as the ones backed by large
→