BriefGPT.xyz
Aug, 2022
探究面向跨语言低资源ASR评估的数据分割策略
Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation
HTML
PDF
Zoey Liu, Justin Spence, Emily Prud'hommeaux
TL;DR
本研究探讨了针对训练资源匮乏的五种语言十种不同数据划分方法的模型性能,揭示不同说话者数据选取对模型性能的影响,表明在数据稀缺情况下采用基于随机划分的数据分割可以产生更可靠和可推广的结果。
Abstract
Many
automatic speech recognition
(ASR) data sets include a single pre-defined test set consisting of one or more speakers whose speech never appears in the training set. This "hold-speaker(s)-out"
data partitioning
→