BriefGPT.xyz
May, 2016
利用视觉问答进行图像字幕排名
Leveraging Visual Question Answering for Image-Caption Ranking
HTML
PDF
Xiao Lin, Devi Parikh
TL;DR
本研究将视觉问题回答任务视为“特征提取”模块,提取图像和标题的表征,以此为基础对图像-标题进行排序并提出融合模型提高图像-标题匹配一致性的表现。实验发现,该模型在MSCOCO数据集上的字幕检索提高了7.1%,图像提取提高了4.4%。
Abstract
visual question answering
(VQA) is the task of taking as input an image and a free-form natural language question about the image, and producing an accurate answer. In this work we view VQA as a "
feature extraction
→