BriefGPT.xyz
Oct, 2023
通过视觉指导优化的改进基准模型
Improved Baselines with Visual Instruction Tuning
HTML
PDF
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
TL;DR
通过对LLaVA进行简单修改,采用CLIP-ViT-L-336px与MLP投影以及添加面向学术任务的VQA数据,我们建立了更强的基线模型,在11个基准测试中达到了最新的成果。
Abstract
large multimodal models
(LMM) have recently shown encouraging progress with
visual instruction tuning
. In this note, we show that the fully-connected vision-language cross-modal connector in
→