TL;DR使用 CREMA-D 数据集,利用带情感的 GAN 生成相对于中性语音的音素长度,可以提供给 TTS 系统以生成更具表现力的语言。使用 IMLE 训练的生成模型也能够实现更好的中性语音机器生成,但仍需进一步主观评价的研究。
Abstract
voice synthesis has seen significant improvements in the past decade
resulting in highly intelligible voices. Further investigations have resulted
in models that can produce variable speech, including conditional emotional
expression. The problem lies, however, in a focus on phrase-lev