BriefGPT.xyz
Jun, 2023
SqueezeLLM:紧密稀疏量化
SqueezeLLM: Dense-and-Sparse Quantization
HTML
PDF
Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li...
TL;DR
通过引入SqueezeLLM后训练的量化框架,该框架不仅实现了高达3位的无损压缩,还在相同的内存约束下实现了更高的量化性能,可以将羊毛出在羊身上,仿佛神器一般。
Abstract
generative large language models
(LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for
inference
has been a significant challenge due to their unprecedented re
→