BriefGPT.xyz
May, 2024
CLIP与优质字幕:强大的视觉任务预训练
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
HTML
PDF
Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Oncel Tuzel
TL;DR
简要概述:通过改进图像-文本数据集中标题的质量,有助于改善CLIP模型的视觉表示质量,并在密集预测视觉任务中取得显著的性能提升。
Abstract
clip models
perform remarkably well on zero-shot classification and retrieval tasks. But recent studies have shown that learnt representations in CLIP are not well suited for
dense prediction tasks
like object de
→