BriefGPT.xyz
Jul, 2024
ModalChorus: 多模态嵌入的视觉探测和对齐通过模态融合图
ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map
HTML
PDF
Yilin Ye, Shishi Xiao, Xingchen Zeng, Wei Zeng
TL;DR
ModalChorus是一种用于视觉和语言多模态嵌入的交互式系统,通过Modal Fusion Map(MFM)嵌入探索和对齐,提高交叉模态特征表达和模型性能,适用于跨模态任务。
Abstract
multi-modal embeddings
form the foundation for vision-language models, such as
clip embeddings
, the most widely used text-image embeddings. However, these embeddings are vulnerable to subtle misalignment of cross
→