Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art -- in Bayesian sparsification of deep neural networks -- combines structural shrinkage priors on model weights with an approximate inference scheme based on black-box stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the computational efficiency and the pruning rate of the BMR method relative to the established stochastic variational inference (SVI) scheme, when applied to the full hierarchical generative model. We illustrate the potential of BMR to prune model parameters across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.

深度学习的复杂模型限制了其巨大潜力的发挥，需要高效的稀疏化技术。贝叶斯稀疏化是一种关键方法，能够设计出在各种深度学习应用中既计算效率高又性能竞争力强的模型。本研究指出贝叶斯模型简化是一种更高效的模型参数修剪方法，相对于现有的基于随机变分推断的方案，具有更好的计算效率和修剪率。研究中通过对各种深度学习架构的实例进行了验证，包括经典的网络如LeNet以及现代框架如视觉Transformer和MLP-Mixer。

贝叶斯模型简化的深度神经网络的贝叶斯稀疏化