BriefGPT.xyz
Aug, 2024
MooER:基于大型语言模型的莫尔线程语音识别和翻译模型
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads
HTML
PDF
Junhao Xu, Zhenlin Liang, Yi Liu, Yichao Hu, Jian Li...
TL;DR
本研究解决了大型语音识别和翻译模型训练数据不足的问题,提出了一种使用5000小时伪标签数据的新训练策略。MooER模型在评测中表现出色,BLEU得分达到25.2,显示出相较于其他开源模型的优势,具有广泛的应用潜力和研究价值。
Abstract
In this paper, we present MooER, a LLM-based large-scale automatic
Speech Recognition
(ASR) / automatic
Speech Translation
(AST) model of Moore Threads. A 5000h pseudo labeled dataset containing
→