MaXM：面向多语言视觉问答的模型

Sep, 2022

Towards Multi-Lingual Visual Question Answering

Soravit Changpinyo, Linting Xue, Idan Szpektor, Ashish V. Thapliyal, Julien Amelot...

TL;DR本文提出了可伸缩的解决方案，涉及多语言视觉问答（mVQA）的数据生成和建模，最终在13种语言中展现出强大性能，同时也创造了MaXM（一个在7种不同语言下的纯测试数据集），从而使得mVQA不仅限于英语，而是扩展到其他语言中。

Abstract

visual question answering (VQA) has been primarily studied through the lens of the English language. Yet, tackling VQA in other languages in the same manner would require considerable amount of resources. In this paper, we propose scalable solutions to multi-lingual →