In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems. We propose an algorithm to find a single quasi-imperceptible perturbation, which when added to any arbitrary speech signal, will most likely fool the victim speech recognition model. Our experiments demonstrate the application of our proposed technique by crafting audio-agnostic universal perturbations for the state-of-the-art ASR system -- Mozilla DeepSpeech. Additionally, we show that such perturbations generalize to a significant extent across models that are not available during training, by performing a transferability test on a WaveNet based ASR system.

本文揭示了全局对抗音频扰动的存在，该扰动通过对自动语音识别系统的信号进行误转录。我们提出了一种算法来查找单个几乎不可感知的扰动，将其添加到任意语音信号中，很可能欺骗受害的语音识别模型。我们的实验表明，我们提出的技术可以将视觉安全的通用扰动用于最新的自动语音识别系统- Mozilla DeepSpeech。此外，我们还通过在WaveNet基于的ASR系统上进行可迁移性测试，表明了这种扰动在未参与训练的模型上有相当广泛的应用。

语音识别系统的通用对抗扰动