This paper provides an overall introduction of our Automatic Speech
Recognition (ASR) systems for southeast asian languages. As not much existing
work has been carried out on such regional languages, a few difficulties should
be addressed before building the systems: limitation on spee
本研究介绍了一种用于从有声读物生成 ASR 训练数据集的新型流程,以应对资源稀缺语言中自动语音识别系统性能较差的问题。该方法通过有效地对齐音频和相应的文本,并将其分割成适合 ASR 训练的长度,简化了资源稀缺语言中 ASR 系统的数据准备工作,并通过对亚美尼亚语的案例研究证明了其应用价值。这种方法可以适用于许多资源稀缺语言,不仅解决了数据匮乏问题,还提高了低资源语言的 ASR 模型性能。