BriefGPT.xyz
Mar, 2023
HiCLIP: 基于分层感知注意力的对比语言-图像预训练
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
HTML
PDF
Shijie Geng, Jianbo Yuan, Yu Tian, Yuxiao Chen, Yongfeng Zhang
TL;DR
本文提出了用层级感知的注意力机制改进CLIP模型,以更好的捕捉图像和文本的高层语义,并在视觉识别和与视觉相关的下游任务中获得良好的结果。
Abstract
The success of large-scale contrastive
vision-language
pretraining (
clip
) has benefited both visual recognition and multimodal content understanding. The concise design brings
→