BriefGPT.xyz
Jun, 2020
更好地利用图片描述提升图像字幕质量
Improving Image Captioning with Better Use of Captions
HTML
PDF
Zhan Shi, Xu Zhou, Xipeng Qiu, Xiaodan Zhu
TL;DR
本文提出了一种新的图像字幕架构,通过构建以字幕为导向的视觉关系图以及利用弱监督多实例学习引入有益的归纳偏差来增强图像表示和字幕生成,实现多模态问题解决和优化。在MSCOCO数据集上进行广泛实验,证明该框架在多种评估指标下取得了业内最优表现。
Abstract
image captioning
is a
multimodal problem
that has drawn extensive attention in both the natural language processing and computer vision community. In this paper, we present a novel
→