BriefGPT.xyz
May, 2023
整数还是浮点数?大语言模型低比特量化的新展望
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
HTML
PDF
Yijia Zhang, Lingran Zhao, Shijie Cao, Wenqiang Wang, Ting Cao...
TL;DR
本研究比较研究了INT和FP低位量化在LLMs中的优劣,发现由于张量分布的复杂性和异质性,最优量化格式因层而异,提出了MoFQ,这种简单易行的方法在各种任务中取得了最新的最佳结果,并且在不引入硬件开销的情况下具有显著的性能改进。
Abstract
Efficient deployment of
large language models
(LLMs) necessitates
low-bit quantization
to minimize model size and inference cost. While low-bit integer formats (e.g., INT8/INT4) have been the conventional choice,
→