BriefGPT.xyz
Oct, 2024
微调CLIP的最后视觉投影器:少样本的丰富性
Fine-Tuning CLIP's Last Visual Projector: A Few-Shot Cornucopia
HTML
PDF
Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick Pérez, Raoul de Charette
TL;DR
本研究解决了如何将对比预训练的视觉语言模型CLIP适应于少样本分类的问题。我们提出了一种新的方法,通过微调视觉编码器的最后投影矩阵,而不引入额外的优化参数,从而在多个基准测试中获得与现有最优方案相当或更优的性能。这一方法可能会推动少样本分类和领域泛化的研究前沿。
Abstract
We consider the problem of adapting a contrastively pretrained vision-language model like
CLIP
(Radford et al., 2021) for
few-shot classification
. The existing literature addresses this problem by learning a line
→