Recent speech technologies have led to produce high quality synthesised speech due to recent advances in neural Text to Speech (TTS). However, such TTS models depend on extensive amounts of data that can be costly to produce and is hardly scalable to all existing languages, especially that seldom attention is given to low resource languages. With techniques such as knowledge transfer, the burden of creating datasets can be alleviated. In this paper, we therefore investigate two aspects; firstly, whether data from social media can be used for a small TTS dataset construction, and secondly whether cross lingual transfer learning (TL) for a low resource language can work with this type of data. In this aspect, we specifically assess to what extent multilingual modeling can be leveraged as an alternative to training on monolingual corporas. To do so, we explore how data from foreign languages may be selected and pooled to train a TTS model for a target low resource language. Our findings show that multilingual pre-training is better than monolingual pre-training at increasing the intelligibility and naturalness of the generated speech.

本研究解决了低资源语言文本到语音(TTS)模型数据集构建的挑战，尤其是从社交媒体获取数据以构建小型数据集。通过跨语言迁移学习，本研究发现多语言预训练在提高生成语音的可懂度和自然性方面优于单语言预训练，展示了其在低资源语言TTS中的重要潜力。

低资源文本到语音的多语言训练策略