BriefGPT.xyz
Mar, 2024
LLaVA-PruMerge: 高效大型多模态模型的自适应令牌减少
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
HTML
PDF
Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan
TL;DR
通过减少视觉标记并合并相关标记,我们提出了一种自适应的视觉标记压缩方法 PruMerge,可以显著减少可视标记的数量并保持相似的模型性能。
Abstract
large multimodal models
(LMMs) have shown significant reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically use a fixed amount of
visual tokens
, such as the penultimate
→