Jun, 2024

Whisper-Flamingo: 集成视觉特征于 Whisper 中用于音频 - 视觉语音识别和翻译

TL;DRAudio-Visual Speech Recognition (AVSR) uses Whisper-Flamingo, a model that integrates visual features, to improve speech recognition and translation performance in noisy conditions for multiple languages.