Self-supervised speech models have shown to be useful for various tasks, but their large size limits the use in devices with low computing power and memory. In this work, we explore early exit, an approach for reducing latency by exiting the forward process of a network early. Most approaches of early exit need a separate early exit model for each task, with some even requiring fine-tuning of the entire pretrained model. We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss, eliminating the need for multiple round of training and fine-tuning. DAISY matches the performance of HuBERT on the MiniSUPERB benchmark, but with much faster inference times. Our analysis on the adaptivity of DAISY shows that the model exits early (using fewer layers) on clean data while exits late (using more layers) on noisy data, dynamically adjusting the computational cost of inference based on the noise level of each sample.

自主训练的语音模型在各种任务中已显示出其有用性，但其庞大的体积限制了在计算能力和内存较低的设备中的使用。本论文探讨了一种早期退出的方法，用于通过尽早退出网络的前向过程来减少延迟。我们介绍了数据自适应自主训练早期退出（DAISY）方法，该方法根据自主训练损失来决定何时退出，消除了多轮训练和微调的需求。DAISY在MiniSUPERB基准测试上与HuBERT的性能相匹配，但推理时间更快。我们对DAISY的适应性进行了分析，结果显示该模型在干净数据上早期退出（使用较少层），而在噪声数据上晚期退出（使用更多层），根据每个样本的噪声水平动态调整推理的计算成本。

DAISY: 自适应数据自我监督的语音表示模型的早期退出