People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even aggravates overfitting. We report an unexpected sparse double descent phenomenon that, as we increase model sparsity via network pruning, test performance first gets worse (due to overfitting), then gets better (due to relieved overfitting), and gets worse at last (due to forgetting useful information). While recent studies focused on the deep double descent with respect to model overparameterization, they failed to recognize that sparsity may also cause double descent. In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. Second, for this phenomenon, we propose a novel learning distance interpretation that the curve of $\ell_{2}$ learning distance of sparse models (from initialized parameters to final parameters) may correlate with the sparse double descent curve well and reflect generalization better than minima flatness. Third, in the context of sparse double descent, a winning ticket in the lottery ticket hypothesis surprisingly may not always win.

我们的研究发现，在通过网络修剪增加模型的稀疏性时，测试性能会出现一个稀疏双下降现象，即测试性能先下降，然后上升并达到顶峰，最后再次下降。我们提出了一个新的学习距离解释，它可以很好地反映稀疏双下降曲线，并比最小值平坦性更好地反映泛化能力，此外，我们还发现在稀疏双下降的情况下，中彩票假设的优势并不总是存在。

稀疏双峰下降：网络修剪加剧过拟合