使用标题和点击数据的多任务文本到视觉嵌入

May, 2019

使用标题和点击数据的多任务文本到视觉嵌入

Multitask Text-to-Visual Embedding with Titles and Clickthrough Data

Pranav Aggarwal, Zhe Lin, Baldo Faieta, Saeid Motiian

TL;DR论文提出一种新的方法，使用图像标题和来自图像搜索引擎的点击数据来学习文本-视觉嵌入，并通过建模嵌入的积极感知提出新的三元损失函数，以及引入一种新的基于小批次的难例负采样方法来提高学习过程的数据效率，实验结果表明，该方法的表现优于现有方法，并且对于现实世界的文本到视觉检索也十分有效。

Abstract

Text-visual (or called semantic-visual) embedding is a central problem in vision-language research. It typically involves mapping of an image and a text description to a common feature space through a CNN image encoder and a RNN language encoder. In this paper, we propose a new method