视觉问答的互注意融合

May, 2018

Reciprocal Attention Fusion for Visual Question Answering

Moshiur R Farazi, Salman Khan

TL;DR该研究提出了一种新的注意力机制，同时考虑视觉细节的两个层次，即物体实例和它们的部分，通过高效的张量分解方案，设计了分层融合多模态信息的模型并提高了已有模型达到了一个显著的提升。

Abstract

Existing attention mechanisms either attend to local image grid or object level features for visual question answering (VQA). Motivated by the observation that questions can relate to both object instances and their parts, we propose a novel →