BriefGPT.xyz
Apr, 2020
基于多视角注意力网络的视觉对话
Multi-View Attention Networks for Visual Dialog
HTML
PDF
Sungjin Park, Taesun Whang, Yeochan Yoon, Hueiseok Lim
TL;DR
论文旨在通过提出 Multi-View Attention Network (MVAN) 模型来解决视觉对话任务中的挑战性问题,该模型基于注意机制,利用多个视角来处理异构输入,并且通过序列对齐过程构建多模态表示,从而可以更好地捕捉到对话历史中与问题相关的信息,并在 VisDial v1.0 数据集上达到了最佳结果。
Abstract
visual dialog
is a challenging vision-language task in which a series of questions visually grounded by a given image are answered. To resolve the
visual dialog
task, a high-level understanding of various
→