超越VQA: 生成多词答案和解释来回答视觉问题

Oct, 2020

超越VQA: 生成多词答案和解释来回答视觉问题

Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions

Radhika Dua, Sai Srinivas Kancheti, Vineeth N Balasubramanian

TL;DR本文介绍了一种新的任务——ViQAR（视觉问题回答和推理），并提出了一种完全生成式的解决方案，它能够为视觉查询生成完整的答案和推理，我们通过定性和定量评估以及人类图灵测试表明，我们的模型能够生成强有力的答案和推理。

Abstract

visual question answering is a multi-modal task that aims to measure high-level visual understanding. Contemporary VQA models are restrictive in the sense that answers are obtained via classification over a limited vocabulary (in the case of open-ended VQA), or via classification over