BriefGPT.xyz
Jan, 2024
跨模态协调:在多元输入模态中的协同
Cross-Modal Coordination Across a Diverse Set of Input Modalities
HTML
PDF
Jorge Sánchez, Rodrigo Laguna
TL;DR
提出两种不同的方法来解决跨模态检索的问题,一种基于CLIP对任意数量的输入模式进行扩展,而第二种方法通过回归跨模态相似性来解决协调问题,并在多个数据集上进行实验证明其简单有效,并允许以新的方式解决检索问题。
Abstract
cross-modal retrieval
is the task of retrieving samples of a given modality by using queries of a different one. Due to the wide range of practical applications, the problem has been mainly focused on the
vision and lan
→