BriefGPT.xyz
Jul, 2024
通过低成本数据策略提升印度TTS系统在实际应用中的词汇外表现
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
HTML
PDF
Srija Anand, Praveen Srinivasa Varadhan, Ashwin Sankar, Giri Raju, Mitesh M. Khapra
TL;DR
改善低资源语言的TTS系统,通过使用便宜的志愿者录制训练数据中未见过的字符二元组,提高模型在未登录词上的性能。
Abstract
Publicly available
tts datasets
for
low-resource languages
like Hindi and Tamil typically contain 10-20 hours of data, leading to poor vocabulary coverage. This limitation becomes evident in downstream applicatio
→