BriefGPT.xyz
Jun, 2019
视频中基于查询的时刻检索的跨模态交互网络
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos
HTML
PDF
Zhu Zhang, Zhijie Lin, Zhou Zhao, Zhenxin Xiao
TL;DR
本文提出一种新颖的交叉模态交互网络 (CMIN),通过语法图卷积网络,多头自注意力和多阶段跨模态交互,综合考虑语言查询的句法结构、视频上下文语义依赖关系和跨模态交互,提高了视频检索准确性。
Abstract
query-based moment retrieval
aims to localize the most relevant moment in an untrimmed video according to the given
natural language query
. Existing works often only focus on one aspect of this emerging task, suc
→