Convolutional neural networks have shown tremendous performance in computer vision tasks,but their excessive amount of weights and operations prevent them from being adopted in embedded environments. One of the solutions involves pruning, where some unimportant weights are forced to be zero. Many pruning schemes have been proposed, but have focused mainly on the number of pruned weights. The previous pruning schemes hardly considered ASIC or FPGA accelerator architectures. When the pruned networks are run on the accelerators, the lack of architecture consideration casues some inefficiency problems including internal buffer mis-alignment and load imbalance. This paper proposes a new pruning scheme that reflects accelerator architectures. In the proposed scheme, pruning is performed so that the same number of weights remain for each weight group corresponding to activations fetched simultaneously. In this way, the pruning scheme resolves the inefficiency problems. Even with the constraint, the proposed pruning scheme reached a pruning ratio similar to that of the previous unconstrained pruning schemes not only in AlexNet and VGG16 but also in the state-of-the-art very-deep networks like ResNet. Furthermore, the proposed scheme demonstrated a comparable pruning ratio in slimmed networks that were already pruned channel-wisely. In addition to improving the efficiency of previous sparse accelerators, it will be also shown that the proposed pruning scheme can be used to reduce the logic complexity of sparse accelerators.

在嵌入式环境中，卷积神经网络因其过多的权重存储和算术运算而未能得到广泛应用，为解决这一问题，本文提出了一种新的修剪方案，以反映加速器架构，通过此方案，性能得到了大幅提升，并成功应用于AlexNet，VGG16，ResNet，MobileNet等多种网络模型。

卷积神经网络的加速器感知剪枝