BriefGPT.xyz
Apr, 2022
在视觉对话中运用多结构常识知识进行推理
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
HTML
PDF
Shunyu Zhang, Xiaoze Jiang, Zequn Yang, Tao Wan, Zengchang Qin
TL;DR
本文提出了一种基于多结构的常识知识的推理模型,通过句子级事实和图级事实表示外部知识,通过图形交互和变压器融合捕获相关的知识并将其融入视觉和语义特征中,已在VisDial v1.0和VisDialCK数据集上取得了有效的性能优于比较方法。
Abstract
visual dialog
requires an agent to engage in a conversation with humans grounded in an image. Many studies on
visual dialog
focus on the understanding of the dialog history or the content of an image, while a con
→