BriefGPT.xyz
Mar, 2024
视觉-语言模型的多模态特征提示
Multi-modal Attribute Prompting for Vision-Language Models
HTML
PDF
Xin Liu, Jiamin Wu, Tianzhu Zhang
TL;DR
我们提出了一种多模态属性提示方法(MAP),通过同时探索文本属性提示、视觉属性提示和属性级对齐来解决大规模预训练视觉-语言模型(VLMs)在少样本情况下的一些局限性,实验结果表明我们的方法在11个数据集上表现优于现有方法。
Abstract
large pre-trained vision-language models
(VLMs), like CLIP, exhibit strong generalization ability to downstream tasks but struggle in
few-shot scenarios
. Existing prompting techniques primarily focus on global te
→