This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at https://github.com/jerry-chee/QuIP .

该研究探讨了在大型语言模型中进行训练后参数量化。通过引入具有不相干处理（QuIP）的量化方法，研究人员发现其在减少权重和Hessian矩阵的量化误差方面表现良好，经过优化的舍入过程以及通过随机正交矩阵进行预处理和后处理可进一步提高效果，成功实现了每个权重仅使用两个比特的LLM量化方法。

QuIP：具有保证的大型语言模型的2位量化