BriefGPT.xyz
Mar, 2024
XLAVS-R:跨语言音视频言语表征学习用于噪音鲁棒言语感知
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
HTML
PDF
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat...
TL;DR
基于XLAVS-R的跨语言音频-视觉语音表示模型,在噪声环境下提高语音识别和翻译的鲁棒性,并在超过100种语言中显示出强大的跨语言音频-视觉能力。
Abstract
speech recognition
and
translation systems
perform poorly on noisy inputs, which are frequent in realistic environments. Augmenting these systems with visual signals has the potential to improve robustness to noi
→