TL;DR本篇论文提出了一种新的候选不受限制的方法——Fine-grained Semantic Alignment Network(FSAN),用于弱监督的Temporal Language Grounding任务,在两个广泛使用的基准测试中取得了最先进的性能。
Abstract
temporal language grounding (TLG) aims to localize a video segment in an untrimmed video based on a natural language description. To alleviate the expensive cost of manual annotations for temporal boundary labels, we are dedicated to the weakly supervised setting, where only video-leve