BriefGPT.xyz
Jul, 2024
GPTQT:将大型语言模型量化两次以提高效率
GPTQT: Quantize Large Language Models Twice to Push the Efficiency
HTML
PDF
Yipin Guo, Yilin Lang, Qinyuan Ren
TL;DR
该研究介绍了一种新的后训练量化方法GPTQT,通过以3位/2位表示LLM的权重,以减少内存使用并增强处理速度。经过测试,与强3位量化基准相比,GPTQT在opt-66B上进一步降低了困惑度4.01,并在opt-30b上提高了1.24倍的速度,说明GPTQT是目前针对此类LLMs的最佳二进制编码量化方法。
Abstract
Due to their large size,
generative large language models
(
llms
) require significant computing and storage resources. This paper introduces a new
→