While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a novel data synthesis pipeline that uses a Large Language Model (LLM) to generate a target domain text corpus, and a state-of-the-art controllable speech synthesis model to generate the corresponding speech. We propose a simple yet effective in-context instruction finetuning strategy to increase the effectiveness of LLM in generating text corpora for new domains. Experiments on the SLURP dataset show that the proposed method achieves an average relative word error rate improvement of $28\%$ on unseen target domains without any performance drop in source domains.

提出了一种新的自适应ASR模型到新目标领域的策略，其中使用大型语言模型生成目标领域文本语料库，并使用最先进的可控语音合成模型生成相应的语音，通过在上下文中进行指令微调以提高大型语言模型生成新领域文本语料库的效果，实验证明该方法在未知目标领域上能达到平均相对词错误率提高28%，且源领域性能无降低。

利用大型语言模型进行零样本ASR领域自适应的语料合成