BriefGPT.xyz
Aug, 2023
FPTQ:大型语言模型的细粒度后训练量化
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
HTML
PDF
Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang...
TL;DR
本研究提出了一种新的基于W4A8的后训练量化方法,结合了现有的两种技术的优势,实现了4位权重量化和8位矩阵计算加速,在多个标准基准测试中获得最新的W4A8量化性能,为大型语言模型的实际应用提供了可能。
Abstract
In the era of
large-scale language models
, the substantial parameter size poses significant challenges for deployment. Being a prevalent compression technique,
quantization
has emerged as the mainstream practice
→