BriefGPT.xyz
Dec, 2023
迈向平衡对齐:视频时刻检索的模态增强语义建模
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
HTML
PDF
Zhihang Liu, Jun Li, Hongtao Xie, Pandeng Li, Jiannan Ge...
TL;DR
通过提升视频模态和文本模态的特征,Modal-Enhanced Semantic Modeling(MESM)框架在视频短片检索中实现了更平衡的对齐,填补了形式上不平衡的模态差距。实验证明该框架在多个基准测试上取得了显著的泛化能力和最佳效果。
Abstract
video moment retrieval
(VMR) aims to retrieve temporal segments in untrimmed videos corresponding to a given language query by constructing cross-modal
alignment strategies
. However, these existing strategies are
→