While large language models (LLMs) excel in many domains, their complexity
and scale challenge deployment in resource-limited environments. Current
compression techniques, such as parameter pruning, often fail to effectively
utilize the knowledge from pruned parameters. To address these challenges, we
propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA),
a novel approach that uses manifold learning and the Normalized Pairwise
Information Bottleneck (NPIB) measure to merge similar layers, reducing model
size while preserving essential performance. We evaluate MKA on multiple
benchmark datasets and various LLMs. Our findings show that MKA not only
preserves model performance but also achieves substantial compression ratios,
outperforming traditional pruning methods. Moreover, when coupled with
quantization, MKA delivers even greater compression. Specifically, on the MMLU
dataset using the Llama3-8B model, MKA achieves a compression ratio of 43.75%
with a minimal performance decrease of only 2.82\%. The proposed MKA method
offers a resource-efficient and performance-preserving model compression
technique for LLMs.

使用流形学习和归一化成对信息瓶颈测量方法的基于流形知识对齐和层合并的压缩（MKA）方法，成功降低模型大小并保持性能，在多个基准数据集和各种大语言模型中取得显著的压缩比，并且在与量化结合时，能够实现更大的压缩，提供了一种资源高效且性能保持的大语言模型压缩技术。

基于流形对齐的层合并压缩 LLM

Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer  Merging

The emerging success of large language models (LLMs) heavily relies on
collecting abundant training data from external (untrusted) sources. Despite
substantial efforts devoted to data cleaning and curation, well-constructed
LLMs have been reported to suffer from copyright infringement, data poisoning,
and/or privacy violations, which would impede practical deployment of LLMs. In
this study, we propose a simple and easily implementable method for purifying
LLMs from the negative effects caused by uncurated data, namely, through
ensembling LLMs with benign and small language models (SLMs). Aside from
theoretical guarantees, we perform comprehensive experiments to empirically
confirm the efficacy of ensembling LLMs with SLMs, which can effectively
preserve the performance of LLMs while mitigating issues such as copyright
infringement, data poisoning, and privacy violations.

我们提出了一种简单易行的方法，通过将大型语言模型（LLMs）与良性和小型语言模型（SLMs）集成，从未经筛选的数据的负面影响中净化 LLMs，以提高其性能并减轻版权侵权、数据污染和隐私侵犯等问题。经过综合实验证明，该方法能有效保持 LLMs 的性能。

通过组合小语言模型提炼大型语言模型

Purifying Large Language Models by Ensembling a Small Language Model

In most of neural machine translation distillation or stealing scenarios, the
goal is to preserve the performance of the target model (teacher). The
highest-scoring hypothesis of the teacher model is commonly used to train a new
model (student). If reference translations are also available, then better
hypotheses (with respect to the references) can be upsampled and poor
hypotheses either removed or undersampled.
This paper explores the importance sampling method landscape (pruning,
hypothesis upsampling and undersampling, deduplication and their combination)
with English to Czech and English to German MT models using standard MT
evaluation metrics. We show that careful upsampling and combination with the
original data leads to better performance when compared to training only on the
original or synthesized data or their direct combination.

本文探究了利用重要性抽样法进行神经机器翻译中的知识蒸馏，包括剪枝、假设上采样和下采样、去重和它们的组合，并使用标准的翻译质量评估方法对英德和英捷翻译模型进行训练与测试，结果显示，仔细选择合适的数据进行上采样并与原始数据组合，可获得更好的性能提升。