BriefGPT.xyz
Dec, 2021
受限资源下的对比视觉-语言预训练
ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources
HTML
PDF
Quan Cui, Boyan Zhou, Yu Guo, Weidong Yin, Hao Wu...
TL;DR
本文提出了一种可在有限资源下进行双编码器多模态表示对齐的新方法,并证明该算法在大规模数据上的有效性。
Abstract
Pioneering
dual-encoder
pre-training
works (e.g., CLIP and ALIGN) have revealed the potential of aligning
multi-modal representations
with
→