Mar, 2022
WavThruVec: 基于中间特征的语音合成中的潜在语音表示
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak, Piotr Dura, Pol van Rijn, Nori Jacoby
TL;DRWavThruVec is a two-stage neural text-to-speech architecture that uses high-dimensional Wav2Vec 2.0 embeddings as intermediate speech representation, allowing the model to be trained on large-scale untranscribed audio corpora and present useful properties enabling tasks like voice conversion or zero-shot synthesis.