Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua
TL;DR使用一种新的学习框架,稳定视频问答模型中的非关键信息,保留关键信息,提高模型推理能力。
Abstract
video question answering (VideoQA) is the task of answering questions about a video. At its core is understanding the alignments between visual scenes in video and linguistic semantics in question to yield the answer. In leading VideoQA models, the typical learning objective,