Cal Peyser, Ronny Huang Andrew Rosenberg Tara N. Sainath, Michael Picheny, Kyunghyun Cho
TL;DR本研究构建了一种联合建模的声学表征学习任务,强调去耦合(disentanglement)声音信号的相关和无关部分,然后证明这些理想的、去耦合的方案具有独特的统计性质,并在训练期间强制执行这些性质,使平均 WER 相对提高了 24.5%,这提出了一种新的有效的音频表示的学习方法。
Abstract
The careful construction of audio representations has become a dominant feature in the design of approaches to many speech tasks. Increasingly, such approaches have emphasized "disentanglement", where a represent