BriefGPT.xyz
Oct, 2023
QLLM:用于大型语言模型的准确高效低比特量化
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
HTML
PDF
Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai...
TL;DR
通过自适应通道重组技术,QLLM提出了一种准确高效的低精度模型量化方法,实现了对大规模语言模型的低精度量化,并在LLaMA-2上相较于之前最先进的方法提高了7.89%的平均准确率。
Abstract
large language models
(LLMs) excel in NLP, but their demands hinder their widespread deployment. While
quantization-aware training
(QAT) offers a solution, its extensive training costs make
→