BriefGPT.xyz
Jun, 2024
面向成千上万种语言的鲁棒语音表征学习
Towards Robust Speech Representation Learning for Thousands of Languages
HTML
PDF
William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian...
TL;DR
我们提出了XEUS,一种用于普适语音的跨语言编码器,通过对来自4057种语言的超过1百万小时数据进行训练,将SSL模型的语言覆盖范围提高了4倍,并在多项基准测试中展现出优于或与最先进的SSL模型的可比结果。
Abstract
self-supervised learning
(SSL) has helped extend
speech technologies
to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose
→