BriefGPT.xyz
Feb, 2024
探索多模态大型语言模型的感知限制
Exploring Perceptual Limitation of Multimodal Large Language Models
HTML
PDF
Jiarui Zhang, Jinyi Hu, Mahyar Khayatkhoei, Filip Ilievski, Maosong Sun
TL;DR
在多模态大型语言模型中,研究了其对小型视觉对象的感知限制,发现对象质量、大小、干扰物的位置等因素都会显著降低模型对视觉问题的回答准确性。该研究对多模态大型语言模型的感知限制进行了探索,为未来模型的感知分析提供了新的评价协议。
Abstract
multimodal large language models
(MLLMs) have recently shown remarkable perceptual capability in answering visual questions, however, little is known about the limits of their
perception
. In particular, while pri
→