BriefGPT.xyz
May, 2024
FFF: 修正有缺陷的基础对比预训练会得到非常强大的视觉-语言模型
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
HTML
PDF
Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
TL;DR
本文研究视觉语言对比预训练中的问题,提出了解决负样本分配不正确和字幕质量低和多样性不足的有效方法,并通过使用sigmoid loss进行训练,在图像识别和图像检索方面取得了非常大的增益。
Abstract
Despite
noise
and
caption quality
having been acknowledged as important factors impacting
vision-language contrastive pre-training
, in thi
→