Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents. Our analysis reveals that despite some models achieving low overall word error rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety. To empirically demonstrate this, we extract clinical entities from transcripts, develop a novel algorithm to align ASR predictions with these entities, and compute medical NE Recall, medical WER, and character error rate. Our results show that fine-tuning on accented clinical speech improves medical WER by a wide margin (25-34 % relative), improving their practical applicability in healthcare environments.

最近自动语音识别在医学领域取得了巨大进展，但对于有口音的医学术语的性能仍然不为人知。本文通过在包含93种非洲口音的临床英语数据集上严格评估多个自动语音识别模型，发现尽管某些模型在总体错误率上取得了很低的成绩，但临床术语上的错误率较高，可能对患者安全构成重大风险。为了实证这一点，我们从转录中提取了临床术语，开发了一种新算法来对齐自动语音识别的预测结果与这些术语，然后计算了医学术语的召回率、医学错误率和字符错误率。我们的结果表明，对有口音的临床语音进行微调可以大幅提高医学术语的错误率（相对提高25-34%），从而提高了它们在医疗环境中的实际应用性。

医学实体在口音语音中的高性能ASR模型