BriefGPT.xyz
Jul, 2024
LLaVA-NeXT-Interleave:处理大型多模态模型中的多图像、视频和3D问题
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
HTML
PDF
Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li...
TL;DR
LLaVA-NeXT-Interleave同时处理LMMs中的多图像、视频、三维和单图像情景,具有出色的多图像、视频和三维基准结果,并展示了几种新兴能力。
Abstract
visual instruction tuning
has made considerable strides in enhancing the capabilities of Large Multimodal Models (
lmms
). However, existing open
l
→