The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly correlated to the network sparsity in this paper. Inspired by interlayer feature similarity in overparameterized models, we investigate the intrinsic link between network sparsity and interlayer feature similarity. Specifically, we prove that reducing interlayer feature similarity based on Centered Kernel Alignment (CKA) improves the sparsity of the network by using information bottleneck theory. Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity. In other words, layers of our sparse network tend to have their own identity compared to each other. Experimentally, we plug the proposed CKA-SR into the training process of sparse network training methods and find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity. Code is included in the supplementary materials.

本文通过探究网络层的特征相似性与网络稀疏性之间的内在联系，提出了一种基于居中核对齐的稀疏正则化方法（CKA-SR），该方法利用CKA降低网络层之间的特征相似度，增加网络稀疏性，并在多种稀疏训练方法上取得了稳定的效果提升，在极高稀疏度下表现尤为明显。

使用恒等层学习稀疏神经网络