Jun, 2024

SyncVSR: 数据高效的视觉语音识别与端到端跨模态音频令牌同步

TL;DRVisual Speech Recognition (VSR) aims to interpret spoken content from visual cues, and SyncVSR presents an end-to-end learning framework that synchronizes visual representation with acoustic data, achieving state-of-the-art results while reducing data usage by up to ninefold.