BriefGPT.xyz
Sep, 2023
通过推理监督实现可解释的视觉问答
Interpretable Visual Question Answering via Reasoning Supervision
HTML
PDF
Maria Parelli, Dimitrios Mallis, Markos Diomataris, Vassilis Pitsikalis
TL;DR
我们提出了一种新的视觉问答架构,通过常识推理作为监督信号来减轻模型在缺乏视觉基础的情况下的性能不足,并通过相似性损失将模型的视觉注意力引导到场景的重要元素,从而提高模型的视觉感知能力和性能。
Abstract
transformer-based architectures
have recently demonstrated remarkable performance in the
visual question answering
(VQA) task. However, such models are likely to disregard crucial visual cues and often rely on mu
→