BriefGPT.xyz
Mar, 2024
多模态自回归建模基于视觉单词
Multi-modal Auto-regressive Modeling via Visual Words
HTML
PDF
Tianshuo Peng, Zuchao Li, Lefei Zhang, Hai Zhao, Ping Wang...
TL;DR
成功进行多模态自回归建模,并首次提出了视觉词概念,将视觉特征映射到LLMs词汇的概率分布,为视觉建模提供了监督信息。通过对5个VQA任务和4个基准工具包的实验结果和消融研究的验证,证明了我们提出方法的强大性能。
Abstract
large language models
(LLMs), benefiting from the
auto-regressive modelling
approach performed on massive unannotated texts corpora, demonstrates powerful perceptual and reasoning capabilities. However, as for ex
→