BriefGPT.xyz
Sep, 2018
使用神经模块网络进行视觉对话中的视觉共指消解
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
HTML
PDF
Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
TL;DR
本研究提出了一个神经模块网络架构,通过引入Refer和Exclude两个新模块,在更细的词级别上执行明确且基于地面的共指解决,以解决视觉对话中的核心指代消解问题,并展示了在MNIST对话和VisDial数据集上的有效性。
Abstract
visual dialog
entails answering a series of questions grounded in an image, using dialog history as context. In addition to the challenges found in visual question answering (VQA), which can be seen as one-round dialog,
→