While many speakers of low-resource languages regularly code-switch between their languages and other regional languages or English, datasets of codeswitched speech are too small to train bespoke acoustic models from scratch or do language model rescoring. Here we propose finetuning self-supervised speech representations such as wav2vec 2.0 XLSR to recognize code-switched data. We find that finetuning self-supervised multilingual representations and augmenting them with n-gram language models trained from transcripts reduces absolute word error rates by up to 20% compared to baselines of hybrid models trained from scratch on code-switched data. Our findings suggest that in circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.

利用自监督语音表示的微调和利用转录训练的n-gram语言模型增强多语言表示，相对于从头开始训练的混合模型，将代码切换数据的绝对词错误率降低了高达20%。研究结果表明，在训练数据受限的情况下，微调自监督表示是一种更优秀和可行的解决方案。

多语言自学习语音表示改进资源有限的非洲语种混杂语音识别