TL;DR我们开发了一种实时语音转换模型,具备母语感、最小延迟生成和多样性切换音色、性别和语音口音的能力,从而提高语音质量,增强现有 ASR 系统的识别性能,并适用于实时多用户通信场景。
Abstract
Currently, the development of foreign accent conversion (FAC) models utilizes
deep neural network architectures, as well as ensembles of neural networks for
speech recognition and speech generation. The use of th