Large Language Models (LLMs), renowned for their remarkable performance,
present a challenge due to their colossal model size when it comes to practical
deployment. In response to this challenge, efforts have been directed toward
the application of traditional network pruning techniques to LLMs, uncovering a
massive number of parameters can be pruned in one-shot without hurting
performance. Building upon insights gained from pre-LLM models, prevailing LLM
pruning strategies have consistently adhered to the practice of uniformly
pruning all layers at equivalent sparsity. However, this observation stands in
contrast to the prevailing trends observed in the field of vision models, where
non-uniform layerwise sparsity typically yields substantially improved results.
To elucidate the underlying reasons for this disparity, we conduct a
comprehensive analysis of the distribution of token features within LLMs. In
doing so, we discover a strong correlation with the emergence of outliers,
defined as features exhibiting significantly greater magnitudes compared to
their counterparts in feature dimensions. Inspired by this finding, we
introduce a novel LLM pruning methodology that incorporates a tailored set of
non-uniform layerwise sparsity ratios specifically designed for LLM pruning,
termed as Outlier Weighed Layerwise sparsity (OWL). The sparsity ratio of OWL
is directly proportional to the outlier ratio observed within each layer,
facilitating a more effective alignment between layerwise weight sparsity and
outlier ratios. Our empirical evaluation, conducted across the LLaMA-V1 family
and OPT, spanning various benchmarks, demonstrates the distinct advantages
offered by OWL over previous methods. For instance, our approach exhibits a
remarkable performance gain, surpassing the state-of-the-art Wanda and
SparseGPT by 61.22 and 6.80 perplexity at a high sparsity level of 70%,
respectively.

大语言模型（LLMs）的巨大模型规模在实际部署中引发挑战，因此针对此问题，我们对传统网络修剪技术应用于 LLMs，大量参数可以被剪枝，而不会损害性能。基于从预训练 LLMs 模型中获得的经验，我们的实验结果表明，非均匀层间稀疏性相比于均匀层间稀疏性通常具有更好的效果。为了阐明这种差异的潜在原因，我们开展了对 LLMs 内部特征分布的全面分析。在这个基础上，我们提出了一种新的 LLMs 修剪方法，包含一套特定设计为 LLMs 修剪的非均匀层间稀疏率，称为离群值加权层间稀疏（OWL）。OWL 的稀疏率与每个层中观察到的离群值比例成正比，使得层间权重稀疏性与离群值比例之间能够更加有效地对齐。我们的实证评估结果显示，OWL 相比于先前方法具有显著的优势，例如，在 70% 的高稀疏度下，我们的方法在困惑度上超过了最新的 Wanda 和 SparseGPT 方法，分别提升了 61.22 和 6.80。