BriefGPT.xyz
Jul, 2024
语义组合提升视觉-语言对比学习
Semantic Compositions Enhance Vision-Language Contrastive Learning
HTML
PDF
Maxwell Aladago, Lorenzo Torresani, Soroush Vosoughi
TL;DR
通过引入语义组合样本,我们通过一个简单的技术(称为CLIP-C),显著改善了零样本图像分类和跨模态检索的能力,而不需要额外的计算开销或模型参数增加。
Abstract
In the field of
vision-language contrastive learning
, models such as
clip
capitalize on matched image-caption pairs as positive examples and leverage within-batch non-matching pairs as negatives. This approach ha
→