关于证据的一般价值和双语场景文本视觉问答

Feb, 2020

关于证据的一般价值和双语场景文本视觉问答

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo...

TL;DR该研究提出了一个多语言数据集，旨在解决视觉问题回答方法的泛化问题，利用基于推理的度量方法来鼓励泛化，并通过提供实验证据表明数据集的价值。

Abstract

visual question answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed