BriefGPT.xyz
Jun, 2024
LipGER:依赖视觉条件的生成式误差纠正用于鲁棒自动语音识别
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
HTML
PDF
Sreyan Ghosh, Sonal Kumar, Ashish Seth, Purva Chiniya, Utkarsh Tyagi...
TL;DR
利用唇部动作的视觉线索,LipGER是一种新颖的框架,用于提高噪音环境下自动语音识别(ASR)系统的性能,通过令一个LLM学习任务来进行视觉条件下的ASR错误校正,大大改善了传统AVSR学习中的关键挑战。
Abstract
visual cues
, like
lip motion
, have been shown to improve the performance of
automatic speech recognition
(ASR) systems in noisy environmen
→