Jun, 2024

Simul-Whisper:带有截断检测的注意力引导流式 Whisper

TL;DRSimul-Whisper is a streaming speech recognition model that utilizes time alignment embedded in Whisper's cross-attention for guiding auto-regressive decoding, achieving chunk-based ASR without fine-tuning, while proposing an integrate-and-fire-based truncation detection model to address the negative effect of truncated words at chunk boundaries, outperforming the current state-of-the-art baseline with a minimal absolute word error rate degradation.