Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations. In this work we present a single grapheme-based ASR model learned on 7 geographically proximal languages, using standard hybrid BLSTM-HMM acoustic models with lattice-free MMI objective. We build the single ASR grapheme set via taking the union over each language-specific grapheme set, and we find such multilingual ASR model can perform language-independent recognition on all 7 languages, and substantially outperform each monolingual ASR model. Secondly, we evaluate the efficacy of multiple data augmentation alternatives within language, as well as their complementarity with multilingual modeling. Overall, we show that the proposed multilingual ASR with various data augmentation can not only recognize any within training set languages, but also provide large ASR performance improvements.

本文介绍了一种单一的基于字形的ASR模型，采用标准的混合BLSTM-HMM声学模型以及晶格自由MMI目标进行学习，能对七种语言进行无歧义的识别，并且比每个单一语言的ASR模型表现更佳。同时，我们还评估了多种数据增强的方法，并且展示了这种提出的多语言字素混合ASR与各种数据增强不仅能识别任何训练集内的语言，还能大大提高ASR性能。

多语种图音融合 ASR 与大规模数据增强