BriefGPT.xyz
Jul, 2020
视频检索的多模态Transformer
Multi-modal Transformer for Video Retrieval
HTML
PDF
Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid
TL;DR
本文提出了一种基于多模态transformer架构的视频检索方法,该方法能够充分利用视频中的跨模态线索,并融合先前的时间信息。我们还研究了联合优化语言嵌入和多模态transformer的最佳实践。该方法在三个数据集上取得了最新的视频检索结果。
Abstract
The task of retrieving video content relevant to natural language queries plays a critical role in effectively handling internet-scale datasets. Most of the existing methods for this
caption-to-
video retrieval
pr
→