BriefGPT.xyz
Sep, 2020
基于场景文本的细粒度图像分类与检索的多模态推理图
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval
HTML
PDF
Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
TL;DR
本文采用图卷积网络结合场景文本实例和显著图像区域进行多模态推理,在 Con-Text 和 Drink Bottle 数据集中,在细粒度图像分类和图像检索任务中显著优于之前的最新技术。
Abstract
scene text instances
found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging
multi-modal conte
→