BriefGPT.xyz
Mar, 2022
在动态音视频场景中学习回答问题
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
HTML
PDF
Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen...
TL;DR
本文研究了Audio-Visual Question Answering(AVQA)任务,提出了一个包含超过45K个问题-答案对的MUSIC-AVQA数据集并使用多模态知识和视听场景的时空推理来解决该问题,结果表明我们的方法优于现有的A-V和AVQA方法。
Abstract
In this paper, we focus on the
audio-visual question answering
(AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. The problem requires comprehensive
→