Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model's capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branched architecture. Then, for inference, we algebraically re-parameterize the trained model into the single-branched form with fewer parameters for a lower memory footprint and compute cost. Using this technique, we show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference. RepCNN re-parameterized models are 43% more accurate than a uni-branch convolutional model while having the same runtime. RepCNN also meets the accuracy of complex architectures like BC-ResNet, while having 2x lesser peak memory usage and 10x faster runtime.

利用重新参数化的技术，我们展示了一个小型卷积模型在推断过程中提供了低延迟和高准确性的权衡，且具有较低的内存占用和计算成本。我们的重新参数化模型在准确性方面提高了43%，而与单分支卷积模型相比具有相同的运行时间。与复杂结构如BC-ResNet相比，RepCNN模型的内存使用减少了2倍，运行时间快了10倍。

RepCNN: 微型、强大的唤醒词检测模型