BriefGPT.xyz
Nov, 2023
语言主导 QFormer 用于高效的视觉语言理解
Language Grounded QFormer for Efficient Vision Language Understanding
HTML
PDF
Moulik Choraria, Nitesh Sekhar, Yue Wu, Xu Zhang, Prateek Singhal...
TL;DR
通过采用更高效的方法,我们提出了基于QFormer的图像-语言对齐策略,并证明了与现有基准方法相比,我们的策略在提高图像-语言预训练效率方面的有效性。
Abstract
Large-scale pretraining and
instruction tuning
have been successful for training general-purpose language models with broad competencies. However, extending to general-purpose
vision-language models
is challengin
→