BriefGPT.xyz
Apr, 2019
TVQA+: 视频问答的时空引用
TVQA+: Spatio-Temporal Grounding for Video Question Answering
HTML
PDF
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
TL;DR
该研究针对视频问答这一任务,提出增加bounding boxes数据集,以此为基础构建了STAGE框架,在空间和时间域上对视频进行处理,以便回答关于视频的自然语言问题,并展示了实验结果和可视化。
Abstract
We present the task of
spatio-temporal video question answering
, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced
visual concepts
(people and objects) to answer
→