长篇语音识别的更新语料库和基准

Sep, 2023

Updated Corpora and Benchmarks for Long-Form Speech Recognition

Jennifer Drexler Fox, Desh Raj, Natalie Delworth, Quinn McNamara, Corey Miller...

TL;DR本文重新发布三个标准的ASR语料库，用于长篇ASR研究，并研究了训练与测试数据不匹配问题，通过基准测试展示了长篇训练在此领域转变下的模型鲁棒性。

Abstract

The vast majority of asr research uses corpora in which both the training and test data have been pre-segmented into utterances. In most r