Mar, 2022

WavThruVec: 基于中间特征的语音合成中的潜在语音表示

TL;DRWavThruVec is a two-stage neural text-to-speech architecture that uses high-dimensional Wav2Vec 2.0 embeddings as intermediate speech representation, allowing the model to be trained on large-scale untranscribed audio corpora and present useful properties enabling tasks like voice conversion or zero-shot synthesis.