Word2VisualVec: 利用视觉特征预测图像和视频对句子的匹配

Apr, 2016

Word2VisualVec: 利用视觉特征预测图像和视频对句子的匹配

Word2VisualVec: Cross-Media Retrieval by Visual Feature Prediction

Jianfeng Dong, Xirong Li, Cees G. M. Snoek

TL;DR本研究旨在寻找最佳描述图像或视频内容的语句，通过生成句向量和多层感知机，构建了一个名为Word2VisualVec的深度神经网络体系结构来实现针对图像或视频与句子的匹配。该体系结构在四个复杂的图像和视频基准上的实验测试中表现出显着的现实结果。

Abstract

This paper attacks the challenging problem of cross-media retrieval. That is, given an image find the text best describing its content, or the other way around. Different from existing works, which either rely on a joint space, or a text space, we propose to perform cross-media retriev