Sliding Window Sum algorithms have been successfully used for training and inference of Deep Neural Networks. We have shown before how both pooling and convolution 1-D primitives could be expressed as sliding sums and evaluated by the compute kernels with a shared structure. In this paper, we present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix Multiplication (GEMM) based convolution in Deep Neural Networks (DNNs). The Sliding Window technique addresses the memory bloating problem and demonstrates a significant speedup in 2-D convolution. We explore the performance of this technique on a range of implementations, including custom kernels for specific filter sizes. Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated hardware accelerators. This could promote a wider adoption of AI on low-power and low-memory devices without the need for specialized hardware. We also discuss the compatibility of model compression methods and optimized network architectures with the Sliding Window technique, encouraging further research in these areas.

滑动窗口求和算法在深度神经网络的训练和推理中取得了成功。本文通过对滑动窗口卷积技术的广泛研究，作为常用的通用矩阵乘法（GEMM）卷积的一种更高效的替代方法，解决了内存膨胀问题，并在二维卷积中展示了显著的加速效果。我们在多种实现方式上探索了该技术的性能，包括针对特定滤波器尺寸的自定义内核。结果表明，在CPU甚至专用硬件加速器上，滑动窗口计算内核可以优于基于GEMM的卷积。这将推动AI在低功耗和低内存设备上的更广泛应用，无需专用硬件。同时，我们还讨论了模型压缩方法和优化网络架构与滑动窗口技术的兼容性，鼓励进一步在这些领域开展研究。

在普通硬件上加速机器学习基元