BriefGPT.xyz
Apr, 2018
通过密集对称共同注意力改进视觉与语言表示的融合用于视觉问答
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
HTML
PDF
Duy-Kien Nguyen, Takayuki Okatani
TL;DR
本文提出了一种基于多步交互和注意力机制的简单且完全对称的网络结构方案,用于解决视觉问答中视觉和语言特征的融合问题,并取得了新的最优结果,而提出的注意力机制也能够生成合理的注意力图从而正确预测答案。
Abstract
A key solution to
visual question answering
(VQA) exists in how to fuse visual and language features extracted from an input image and question. We show that an
attention mechanism
that enables dense, bi-directio
→