BriefGPT.xyz
Mar, 2022
文本-视频检索的解耦表征学习
Disentangled Representation Learning for Text-Video Retrieval
HTML
PDF
Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan, Xian-Sheng Hua
TL;DR
本文针对文本-视频检索中交叉模态交互问题进行了研究,并提出了一种基于分离框架的、具有序列和分级表示的模型来优化模型性能,通过在各类基准测试上验证表明,该模型取得了较好的效果
Abstract
Cross-modality interaction is a critical component in
text-video retrieval
(TVR), yet there has been little examination of how different influencing factors for computing interaction affect performance. This paper first studies the
→