BriefGPT.xyz
Oct, 2023
从稀缺到高效:通过视觉丰富的标题改进CLIP训练
From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions
HTML
PDF
Zhengfeng Lai, Haotian Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev...
TL;DR
本研究关注于通过改善数据质量和数据多样性,特别强调了视觉概念与标题的整合,提出了一种用于web爬取数据集训练的新方法VeCLIP,通过综合评估数据效率和模型性能,证明了VeCLIP在改善图片-文本对齐和整体模型性能方面的显著优势。
Abstract
web-crawled datasets
are pivotal to the success of
pre-training vision-language models
, exemplified by CLIP. However, web-crawled AltTexts can be noisy and potentially irrelevant to images, thereby undermining th
→