BriefGPT.xyz
Dec, 2023
SCLIP:为密集视觉语言推理重新思考自注意力
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
HTML
PDF
Feng Wang, Jieru Mei, Alan Yuille
TL;DR
通过引入新的自相关自注意力(CSA)机制,增强了CLIP在语义分割方面的潜力,并且在零样本mIoU方面明显优于现有的SoTA结果和原始的CLIP。
Abstract
Recent advances in
contrastive language-image pretraining
(CLIP) have demonstrated strong capabilities in
zero-shot classification
by aligning visual representations with target text embeddings in an image level.
→