Large Language Models (LLMs) have become pivotal in advancing the field of
artificial intelligence, yet their immense sizes pose significant challenges
for both fine-tuning and deployment. Current post-training pruning methods,
while reducing the sizes of LLMs, often fail to maintain their original
performance. To address these challenges, this paper introduces SPP, a
Sparsity-Preserved Parameter-efficient fine-tuning method. Different from
existing post-training pruning approaches that struggle with performance
retention, SPP proposes to employ lightweight learnable column and row matrices
to optimize sparse LLM weights, keeping the structure and sparsity of pruned
pre-trained models intact. By element-wise multiplication and residual
addition, SPP ensures the consistency of model sparsity pattern and ratio
during both training and weight-merging processes. We demonstrate the
effectiveness of SPP by applying it to the LLaMA and LLaMA-2 model families
with recent post-training pruning methods. Our results show that SPP
significantly enhances the performance of models with different sparsity
patterns (i.e. unstructured and N:M sparsity), especially for those with high
sparsity ratios (e.g. 75%), making it a promising solution for the efficient
fine-tuning of sparse LLMs. Code will be made available at
this https URL

介绍了一种基于稀疏保持参数高效微调的方法，通过轻量级可学习的列和行矩阵对稀疏大语言模型的权重进行优化，保持修剪过的预训练模型的结构和稀疏性，显著提升了稀疏大语言模型的性能。

SPP：稀疏保存的参数高效微调大型语言模型

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large  Language Models

Existing methods for fine-tuning sparse LLMs often suffer from
resource-intensive requirements and high retraining costs. Additionally, many
fine-tuning methods often rely on approximations or heuristic optimization
strategies, which may lead to suboptimal solutions. To address these issues, we
propose an efficient and fast framework for fine-tuning sparse LLMs based on
minimizing reconstruction error. Our approach involves sampling a small dataset
for calibration and utilizing backpropagation to iteratively optimize
block-wise reconstruction error, on a block-by-block basis, aiming for optimal
solutions. Extensive experiments on various benchmarks consistently demonstrate
the superiority of our method over other baselines. For instance, on the
Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a
perplexity of 16.88, surpassing the state-of-the-art DSnoT with a perplexity of
75.14. Moreover, with a structured sparsity ratio of 26\%, EBFT achieves a
perplexity of 16.27, outperforming LoRA (perplexity 16.44). Furthermore, the
fine-tuning process of EBFT for LlamaV1-7B only takes approximately 30 minutes,
and the entire framework can be executed on a single 16GB GPU. The source code
is available at this https URL

我们提出了一种基于最小化重构误差的高效快速稀疏 LLMs 微调框架，通过采样小数据集进行校准，并利用反向传播逐块优化重构误差，从而获得最佳解决方案。我们在各种基准测试中进行了广泛实验，始终证明我们的方法优于其他基准线。

EBFT: 有效和块级的稀疏 LLMs 微调

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

The ever-increasing large language models (LLMs), though opening a potential
path for the upcoming artificial general intelligence, sadly drops a daunting
obstacle on the way towards their on-device deployment. As one of the most
well-established pre-LLMs approaches in reducing model complexity, network
pruning appears to lag behind in the era of LLMs, due mostly to its costly
fine-tuning (or re-training) necessity under the massive volumes of model
parameter and training data. To close this industry-academia gap, we introduce
Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach that
slightly updates sparse LLMs without the expensive backpropagation and any
weight updates. Inspired by the Dynamic Sparse Training, DSnoT minimizes the
reconstruction error between the dense and sparse LLMs, in the fashion of
performing iterative weight pruning-and-growing on top of sparse LLMs. To
accomplish this purpose, DSnoT particularly takes into account the anticipated
reduction in reconstruction error for pruning and growing, as well as the
variance w.r.t. different input data for growing each weight. This practice can
be executed efficiently in linear time since its obviates the need of
backpropagation for fine-tuning LLMs. Extensive experiments on LLaMA-V1/V2,
Vicuna, and OPT across various benchmarks demonstrate the effectiveness of
DSnoT in enhancing the performance of sparse LLMs, especially at high sparsity
levels. For instance, DSnoT is able to outperform the state-of-the-art Wanda by
26.79 perplexity at 70% sparsity with LLaMA-7B. Our paper offers fresh insights
into how to fine-tune sparse LLMs in an efficient training-free manner and open
new venues to scale the great potential of sparsity to LLMs. Codes are
available at this https URL

基于 Dynamic Sparse No Training (DSnoT) 的训练无关的微调方法，能够有效地提高稀疏语言模型的性能，并开拓了将稀疏性应用于大型语言模型的潜力。