Convolutional Neural Networks (CNNs) are extensively used in image and video recognition, natural language processing and other machine learning applications. The success of CNNs in these areas corresponds with a significant increase in the number of parameters and computation costs. Recent approaches towards reducing these overheads involve pruning and compressing the weights of various layers without hurting the overall CNN performance. However, using model compression to generate sparse CNNs mostly reduces parameters from the fully connected layers and may not significantly reduce the final computation costs. In this paper, we present a compression technique for CNNs, where we prune the filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole planes in the network, together with their connecting convolution kernels, the computational costs are reduced significantly. In contrast to other techniques proposed for pruning networks, this approach does not result in sparse connectivity patterns. Hence, our techniques do not need the support of sparse convolution libraries and can work with the most efficient BLAS operations for matrix multiplications. In our results, we show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% while regaining close to the original accuracy by retraining the networks.

本文提出了一种基于滤波器减少方法的 CNNs 加速方法，它不依赖稀疏卷积库，通过移除对输出准确性影响较小的整个滤波器及其连接的特征图，大大降低了计算成本，在 CIFAR10 数据集上可以使 VGG-16 推理时间减少 34%、ResNet-110 推理时间减少 38%，并且通过重新训练网络可以接近原始准确性。

高效卷积神经网络中的滤波器裁剪