VU-BERT：一个视觉对话的统一框架

Feb, 2022

VU-BERT: A Unified framework for Visual Dialog

Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng...

TL;DR本文提出了一种名为VU-BERT图文联合嵌入的框架，通过用patch projection获取视觉嵌入来简化模型，从而解决了现有研究中用于建模交互的具有特定模态的模块难以使用的问题，并在可视对话任务上取得了较高的竞争性表现。

Abstract

The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history. Existing researches ten