端到端语音合成的鲁棒性和精细语调控制

Nov, 2018

Robust and fine-grained prosody control of end-to-end speech synthesis

Younggun Lee, Taesu Kim

TL;DR该研究提出了一种利用时间结构的音调嵌入方法，实现对合成语音言语风格的精细控制，可在帧级别和音素级别改变合成语音的音高和振幅，并且通过嵌入的时间归一化得到更好的抵抗讲话风格扰动的鲁棒性。

Abstract

We propose prosody embeddings for emotional and expressive speech synthesis networks. The proposed methods introduce temporal structures i