BriefGPT.xyz
May, 2024
I-LLM:针对完全量化低位大型语言模型的高效整数推断
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
HTML
PDF
Xing Hu, Yuan Chen, Dawei Yang, Sifan Zhou, Zhihang Yuan...
TL;DR
该研究论文通过提出一种新型的整数化后训练量化框架(I-LLM),解决了大语言模型在部署边缘和云设备上仍需要大量浮点运算的问题。实验证明,I-LLM在保持准确性的前提下,可以以W4A4进行操作,优于其他非整数量化方法。
Abstract
post-training quantization
(PTQ) serves as a potent technique to accelerate the inference of
large language models
(LLMs). Nonetheless, existing works still necessitate a considerable number of floating-point (FP
→