面向推理感知的可解释视觉问答

Nov, 2022

Towards Reasoning-Aware Explainable VQA

Rakesh Vaideeswaran, Feng Gao, Abhinav Mathur, Govind Thattai

TL;DR该研究提出了一种基于最先进的VQA框架的端到端解释生成模块，通过引入LSTM和Transformer解码器，生成人类可读的文本解释，同时保持SOTA VQA精度。

Abstract

The domain of joint vision-language understanding, especially in the context of reasoning in visual question answering (VQA) models, has garnered significant attention in the recent past. While most of the existing VQA models focus on improving the accuracy of VQA, the way models arriv