BriefGPT.xyz
Nov, 2024
通过自主想象增强多模态大型语言模型的视觉推理能力
Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
HTML
PDF
Jingming Liu, Yumeng Li, Boyuan Xiao, Yichang Jian, Ziang Qin...
TL;DR
本研究解决了当前多模态大型语言模型在处理复杂视觉场景时存在的推理困难,提出了一种新颖的视觉推理范式,使模型能够根据推理状态自主修改输入场景。在此基础上,研究结果表明,该方法显著提高了模型的推理能力,超越了传统的线索发现技术。
Abstract
There have been recent efforts to extend the
Chain-of-Thought
(CoT) paradigm to Multimodal Large Language Models (MLLMs) by finding visual clues in the input scene, advancing the
Visual Reasoning
ability of MLLMs
→