场景图生成可解释模型

Nov, 2018

An Interpretable Model for Scene Graph Generation

Ji Zhang, Kevin Shih, Andrew Tao, Bryan Catanzaro, Ahmed Elgammal

TL;DR提出了一种高效且可解释的场景图生成器，考虑了视觉、空间和语义三种特征并使用了后期融合策略，模型在 OpenImages 可视关系检测竞赛中表现优越，得分比第二名高出 5%（相对增长率 20%），该生成器是实现基于视觉语言任务如图像字幕和视觉问答的重要基石。

Abstract

We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature's contribution can be explicitly investigated. We study the key factors about these features that