Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs. We call this approach "the lens of perturbation". Using this lens, we conduct experiments with various artificial perturbations to explore their impact on LLM performance. Our findings reveal several connections between the properties of perturbations and LLM performance, providing insights into the failure cases of uniform quantization and suggesting potential solutions to improve the robustness of LLM quantization. To demonstrate the significance of our findings, we implement a simple non-uniform quantization approach based on our insights. Our experiments show that this approach achieves minimal performance degradation on both 4-bit weight quantization and 8-bit quantization for weights and activations. These results validate the correctness of our approach and highlight its potential to improve the efficiency of LLMs without sacrificing performance.

量化作为一种改善大型语言模型的存储和计算效率的有前途的技术，本研究以新的扰动视角，研究了量化与大型语言模型性能之间的关系，并发现了扰动特性与性能之间的联系，提供了改善模型量化鲁棒性的潜在解决方案，并在实验证明了基于这一视角的简单非均匀量化方法在权重和激活量化方面都能达到较小的性能损失，以此改善大型语言模型的效率而不牺牲性能。

大型语言模型量化之困: 基于扰动视角的实证研究