BriefGPT.xyz
Jun, 2024
VISTA:可视化文本嵌入用于通用多模态检索
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
HTML
PDF
Junjie Zhou, Zheng Liu, Shitao Xiao, Bo Zhao, Yongping Xiong
TL;DR
我们提出了一种新的嵌入模型VISTA,用于通用的多模态检索,该模型在零样本和监督设置下在各种多模态检索任务中都取得了优越的性能。
Abstract
multi-modal retrieval
becomes increasingly popular in practice. However, the existing retrievers are mostly text-oriented, which lack the capability to process
visual information
. Despite the presence of vision-l
→