BriefGPT.xyz
Oct, 2021
注意力视觉关键词检测
Visual Keyword Spotting with Attention
HTML
PDF
K R Prajwal, Liliane Momeni, Triantafyllos Afouras, Andrew Zisserman
TL;DR
本研究提出Transpotter模型,使用全面的跨模态注意力机制在视觉和语音流之间进行交互,成功实现静默视频序列中的语音关键词检测,并且在多项测试中,优于当前视觉关键词检测和唇语识别模型,并具备较强的嘴型单词分离的能力。
Abstract
In this paper, we consider the task of spotting spoken keywords in silent video sequences -- also known as
visual keyword spotting
. To this end, we investigate
transformer-based models
that ingest two streams, a
→