Sudipta Paul, Niluthpol Chowdhury Mithun, Amit K. Roy-Chowdhury
TL;DR本文提出了一种Hierarchical Moment Alignment Network方法,该方法基于文本查询,可以在视频语料库中检索相关视频,并对视频中的时刻进行定位。实验结果表明该方法在三个基准测试集上实现了令人满意的性能表现。
Abstract
Prior works on text-based video moment localization focus on temporally grounding the textual query in an untrimmed video. These works assume that the relevant video is already known and attempt to localize the moment on that relevant video only. Different from such works, we relax thi