This paper proposes Allophant, a multilingual phoneme recognizer. It requires only a phoneme inventory for cross-lingual transfer to a target language, allowing for low-resource recognition. The architecture combines a compositional phone embedding approach with individually supervised phonetic attribute classifiers in a multi-task architecture. We also introduce Allophoible, an extension of the PHOIBLE database. When combined with a distance based mapping approach for grapheme-to-phoneme outputs, it allows us to train on PHOIBLE inventories directly. By training and evaluating on 34 languages, we found that the addition of multi-task learning improves the model's capability of being applied to unseen phonemes and phoneme inventories. On supervised languages we achieve phoneme error rate improvements of 11 percentage points (pp.) compared to a baseline without multi-task learning. Evaluation of zero-shot transfer on 84 languages yielded a decrease in PER of 2.63 pp. over the baseline.

本文提出了Allophant，这是一种多语言音素识别器。它仅需要音素清单即可进行跨语言转移，从而实现低资源识别。该架构将组合音素嵌入方法与个别受监督的语音属性分类器相结合在一个多任务架构中。我们还介绍了Allophoible，它是PHOIBLE 数据库的一个扩展。

Allophant：带有发音属性的跨语言音素识别