BriefGPT.xyz
May, 2024
MLLMs的密集连接器
Dense Connector for MLLMs
HTML
PDF
Huanjin Yao, Wenhao Wu, Taojiannan Yang, YuXin Song, Mengxi Zhang...
TL;DR
我们引入了密集连接器——一个简单、有效且即插即用的视觉语言连接器,通过利用多层视觉特征显著增强现有的多模态大型语言模型(MLLMs),并且在仅依靠图像训练的情况下,展示了在视频理解方面的显著零样本能力。
Abstract
Do we fully leverage the potential of
visual encoder
in
multimodal large language models
(MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both aca
→