Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation. Most recent state-of-the-art (SOTA) separation models have been time-domain audio separation networks (TasNets). A number of successful models have made use of dual-path (DP) networks which sequentially process local and global information. Time domain conformers (TD-Conformers) are an analogue of the DP approach in that they also process local and global context sequentially but have a different time complexity function. It is shown that for realistic shorter signal lengths, conformers are more efficient when controlling for feature dimension. Subsampling layers are proposed to further improve computational efficiency. The best TD-Conformer achieves 14.6 dB and 21.2 dB SISDR improvement on the WHAMR and WSJ0-2Mix benchmarks, respectively.

语音分离对于多说话者技术研究人员来说仍然是一个重要的课题。卷积增强变换器（conformers）在许多语音处理任务中表现良好，但在语音分离方面研究较少。最近的最新分离模型一直是时域音频分离网络（TasNets）。一些成功的模型利用了双路径（DP）网络，这些网络顺序处理本地和全局信息。时域conformers（TD-Conformers）是DP方法的一种类似方式，它们也顺序处理本地和全局上下文，但时间复杂性函数不同。结果表明，在现实中较短的信号长度下，控制特征维度时conformers更有效。提出了子采样层以进一步提高计算效率。最佳的TD-Conformer在WHAMR和WSJ0-2Mix基准测试上分别实现了14.6 dB和21.2 dB的SISDR改进。

关于嘈杂多信道语音分离中的时域Conformer模型