TL;DR利用前缀到前缀框架构建的神经增量文本转语音系统,实现了在线语音合成,从而将计算延迟和输入延迟分别降至 O (1) 水平。
Abstract
text-to-speech synthesis (TTS) has witnessed rapid progress in recent years,
where neural methods became capable of producing audios with high naturalness.
However, these efforts still suffer from two types of latencies: (a) the {\em
computational latency} (synthesizing time), which gr