BriefGPT.xyz
Apr, 2022
稳健的跨模态表示学习与渐进式自蒸馏
Robust Cross-Modal Representation Learning with Progressive Self-Distillation
HTML
PDF
Alex Andonian, Shixing Chen, Raffay Hamid
TL;DR
通过交叉模态对比学习以及软图像-文本对齐等方法,改进了CLIP模型,在处理带有噪声的数据集时能更加高效地学习出具有鲁棒性的表示。经过对14个基准数据集的广泛评估,该方法在多种设置下表现均优于CLIP,并且没有增加计算成本。此外,该方法还在自然分布偏移的鲁棒性测试中表现更好。
Abstract
The learning objective of vision-language approach of
clip
does not effectively account for the noisy many-to-many correspondences found in
web-harvested image captioning datasets
, which contributes to its comput
→