In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in that flat region. The proposed pruning method is automatic in the sense that neither retraining nor expert knowledge is required. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned $\ell_1$ proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results show that our algorithm performs competitively in highly sparse regime (92\% sparsity) among many existing automatic pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our algorithm reaches the same minima valley as the SGD, and the minima found by our algorithm and the SGD do not deviate in directions that impact the training loss.

提出一种新的方向剪枝方法，用于在训练损失的平稳区域内或接近该区域内寻找稀疏解，证明了该方法在高度稀疏时对ResNet50，VGG16和wide ResNet 28x10等神经网络的同时达到与SGD相同的极小值，并且所找到的极小值不会影响训练损失

深度神经网络的方向修剪