BriefGPT.xyz
Nov, 2023
基于深度学习的视觉-语言任务统一框架
InfMLLM: A Unified Framework for Visual-Language Tasks
HTML
PDF
Qiang Zhou, Zhibin Wang, Wei Chu, Yinghui Xu, Hao Li...
TL;DR
通过引入pool-adapter模块,保留视觉嵌入的位置信息,我们的InfMLLM方法在图像描述、视觉问题回答和视觉定位等任务中达到了与最新的多模态大语言模型相当或超越的性能。
Abstract
large language models
(LLMs) have proven their remarkable versatility in handling a comprehensive range of language-centric applications. To expand LLMs' capabilities to a broader spectrum of modal inputs, multimodal
la
→