BriefGPT.xyz
Jun, 2024
MG-LLaVA:面向多粒度视觉指导调整
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
HTML
PDF
Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li...
TL;DR
采用多种视觉特征与语言模型相结合的创新方法MG-LLaVA,在感知任务中提供了出色的表现,并且超越了相似参数规模的现有模型,具备出色的目标识别能力。
Abstract
multi-modal large language models
(MLLMs) have made significant strides in various
visual understanding tasks
. However, the majority of these models are constrained to process low-resolution images, which limits
→