Jun, 2024
Simul-Whisper:带有截断检测的注意力引导流式 Whisper
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
Haoyu Wang, Guoqiang Hu, Guodong Lin, Wei-Qiang Zhang, Jian Li
TL;DRSimul-Whisper is a streaming speech recognition model that utilizes time alignment embedded in Whisper's cross-attention for guiding auto-regressive decoding, achieving chunk-based ASR without fine-tuning, while proposing an integrate-and-fire-based truncation detection model to address the negative effect of truncated words at chunk boundaries, outperforming the current state-of-the-art baseline with a minimal absolute word error rate degradation.