BriefGPT.xyz
Oct, 2023
SILC:用自我蒸馏提升视觉语言预训练
SILC: Improving Vision Language Pretraining with Self-Distillation
HTML
PDF
Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer, Luc Van Gool...
TL;DR
基于对 CLIP 模型的改进,本研究提出了 SILC 方法,通过引入本地到全局对应学习来预训练模型,有效提升了计算机视觉领域中的分类、检索和分割等任务的性能,取得了零样本分类、少样本分类、图像与文本检索、无样本分割以及开放词汇分割等方面的最新技术成果。
Abstract
image-text pretraining
on web-scale image caption dataset has become the default recipe for open vocabulary classification and retrieval models thanks to the success of
clip
and its variants. Several works have a
→