BriefGPT.xyz
Feb, 2017
MAT: 图像字幕的多模态注意力翻译器
MAT: A Multimodal Attentive Translator for Image Captioning
HTML
PDF
Chang Liu, Fuchun Sun, Changhu Wang, Feng Wang, Alan Yuille
TL;DR
通过序列到序列的循环神经网络模型,从图像中提取对象序列并引入顺序注意力层,将图像的顺序信息自然地转化为单词序列,在MS COCO数据集中超越了现有方法并且在评估服务中也取得了竞争性的结果。
Abstract
In this work we formulate the problem of
image captioning
as a multimodal translation task. Analogous to machine translation, we present a
sequence-to-sequence
→