BriefGPT.xyz
Oct, 2019
跨模态场景图匹配用于关系感知的图像-文本检索
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
HTML
PDF
Sijin Wang, Ruiping Wang, Ziwei Yao, Shiguang Shan, Xilin Chen
TL;DR
本文研究了如何使用视觉场景图和文本场景图来联合表示图像和文本中的对象和关系,从而进行跨模态图像文本检索。本研究通过设计特定的场景图编码器实现了物体级和关系级跨模态特征的提取,取得了Flickr30k和MSCOCO数据集上最先进的结果。
Abstract
image-text retrieval
of natural scenes has been a popular research topic. Since image and text are heterogeneous
cross-modal data
, one of the key challenges is how to learn comprehensive yet unified representatio
→