We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving performance levels competitive with dense networks. We accomplish this by developing sparse momentum, an algorithm which uses exponentially smoothed gradients (momentum) to identify layers and weights which reduce the error efficiently. Sparse momentum redistributes pruned weights across layers according to the mean momentum magnitude of each layer. Within a layer, sparse momentum grows weights according to the momentum magnitude of zero-valued weights. We demonstrate state-of-the-art sparse performance on MNIST, CIFAR-10, and ImageNet, decreasing the mean error by a relative 8%, 15%, and 6% compared to other sparse algorithms. Furthermore, we show that our algorithm can reliably find the equivalent of winning lottery tickets from random initialization: Our algorithm finds sparse configurations with 20% or fewer weights which perform as well, or better than their dense counterparts. Sparse momentum also decreases the training time: It requires a single training run -- no re-training is required -- and increases training speed up to 11.85x. In our analysis, we show that our sparse networks might be able to reach dense performance levels by learning more general features which are useful to a broader range of classes than dense networks.

本文研究了通过发展一种称为稀疏动量的算法，实现在深度神经网络训练过程中，保持稀疏权重的同时实现稠密表现水平的加速训练方法，实验证明稀疏动量可靠地重现稠密表现水平并提供最多5.61倍的训练加速度。

从零开始的稀疏神经网络：在不降低性能的情况下更快地训练