In this paper, we explore a continuous modeling approach for deep-learning-based speech enhancement, focusing on the denoising process. We use a state variable to indicate the denoising process. The starting state is noisy speech and the ending state is clean speech. The noise component in the state variable decreases with the change of the state index until the noise component is 0. During training, a UNet-like neural network learns to estimate every state variable sampled from the continuous denoising process. In testing, we introduce a controlling factor as an embedding, ranging from zero to one, to the neural network, allowing us to control the level of noise reduction. This approach enables controllable speech enhancement and is adaptable to various application scenarios. Experimental results indicate that preserving a small amount of noise in the clean target benefits speech enhancement, as evidenced by improvements in both objective speech measures and automatic speech recognition performance.

本文研究了基于深度学习的语音增强中的连续建模方法，重点关注降噪过程。通过引入一个状态变量来表示降噪过程，训练中使用类似UNet结构的神经网络学习估计连续降噪过程中的每个状态变量，测试时引入一个控制因子作为嵌入，可调整噪声削减水平。该方法可以实现可控语音增强，并适用于不同的应用场景。实验结果显示，在清晰目标中保留少量噪声有助于语音增强，从客观语音指标和自动语音识别性能的改善来验证。

基于深度学习的语音增强去噪过程连续建模