BriefGPT.xyz
Jun, 2024
预训练视觉-语言模型的高效和长尾泛化
Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model
HTML
PDF
Jiang-Xin Shi, Chi Zhang, Tong Wei, Yu-Feng Li
TL;DR
针对使CLIP适应现实世界的挑战,我们提出了一种名为Candle的新框架,通过引入新的损失函数、跨模态注意力和虚拟原型来实现高效、长尾泛化,该方法在11个不同数据集上展示出了卓越的性能,并大大减少了训练时间。
Abstract
pre-trained vision-language models
like
clip
have shown powerful zero-shot inference ability via image-text matching and prove to be strong few-shot learners in various
→