BriefGPT.xyz
Feb, 2024
APTQ:针对大型语言模型的注意力感知后训练混合精度量化
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
HTML
PDF
Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong...
TL;DR
通过引入APTQ(关注感知的后训练混合精度量化),该研究提出了一种在大规模语言模型上进行混合精度量化的方法,利用Hessian迹作为灵敏度指标,以实现在模型性能保持的前提下进行精度降低,并取得了优于以往量化方法的效果。
Abstract
large language models
(LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for deployment on edge devices. To this end, we propose
→