Cai Chen, Runzhong Zhang, Jianjun Gao, Kejun Wu, Kim-Hui Yap...
TL;DR利用伪查询特征加强领域间的联通,提高视觉和语言之间的特征对齐,以实现更好的时间句子定位。
Abstract
temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this