BriefGPT.xyz
Feb, 2024
大型语言模型的量化策略的全面评估
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
HTML
PDF
Renren Jin, Jiangcun Du, Wuwei Huang, Wei Liu, Jian Luan...
TL;DR
对大型语言模型的量化技术进行研究,发现4位量化的语言模型可以在大部分基准测试中保持与非量化模型相当的性能,并且困惑度可以作为量化语言模型的代理度量。然而,量化也会影响推断速度,因此在优化解码速度和内存消耗方面需要进行大量工程努力和硬件支持。
Abstract
Increasing the number of parameters in
large language models
(LLMs) usually improves
performance
in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings.
→