BriefGPT.xyz
Apr, 2022
通过对比学习提高视觉对话中的跨模态理解
Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning
HTML
PDF
Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu
TL;DR
本文提出了基于VD-BERT模型的ICMU方法,通过四向对比学习区分不同的输入来提高跨模态理解,支持多次视觉对话,改善视觉对话模型的跨模态理解,并在VisDial数据集上得到了令人满意的结果。
Abstract
visual dialog
is a challenging vision-language task since the
visual dialog
agent needs to answer a series of questions after reasoning over both the image content and dialog history. Though existing methods try
→