This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90\%$. Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization, suggesting new directions in the presence of both benefits and pitfalls of reconstruction for pruning LLMs.

通过分割模型、顺序修剪、重构稠密对应模型的预测，及时合并稀疏子模型，本文首次提出了一系列重建技术，可以显著降低高复原误差，并发现最小化复原误差并非总是理想的，引入自动生成校准数据的策略以平衡复原和泛化之间的权衡，为剪枝大型语言模型的新方向提供了新思路。

重新思考大型语言模型剪枝: 重构误差最小化的好处和陷阱