BriefGPT.xyz
Sep, 2023
一个关于扩展面向指令调整的大型多模态模型的实证研究
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
HTML
PDF
Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao...
TL;DR
通过调整视觉指导,对开源大型多模态模型进行扩展研究,探索影响多模态和语言能力的变量,发现扩展模型能够提升性能,具有与整个模型微调相当的效果,并强调了提高图像分辨率和混合多模态语言数据对性能的重要性,有时视觉指导可以提高纯语言功能。
Abstract
visual instruction tuning
has recently shown encouraging progress with
open-source large multimodal models
(LMM) such as LLaVA and MiniGPT-4. However, most existing studies of open-source LMM are performed using
→