Jun, 2024
Whisper-Flamingo: 集成视觉特征于 Whisper 中用于音频 - 视觉语音识别和翻译
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne...
TL;DRAudio-Visual Speech Recognition (AVSR) uses Whisper-Flamingo, a model that integrates visual features, to improve speech recognition and translation performance in noisy conditions for multiple languages.