BriefGPT.xyz
Jun, 2024
MaLa-ASR: 多媒体辅助的基于LLM的ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
HTML
PDF
Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang...
TL;DR
提出了基于LLM的ASR模型MaLa-ASR,可以整合从演示文稿中提取的文本关键词来提高会议内容的识别率,通过在输入提示中添加关键词,可将有偏差的词错误率(B-WER)相对减少46.0%和44.2%,在该数据集上取得了新的SOTA。
Abstract
As more and more information-rich data like video become available, utilizing
multi-modal auxiliary information
to enhance audio tasks has sparked widespread research interest. The recent surge in research on
llm-based
→