BriefGPT.xyz
Jul, 2024
LeanQuant: 准确的大型语言模型量化方法 - 基于损失-误差感知的网格
LeanQuant: Accurate Large Language Model Quantization with Loss-Error-Aware Grid
HTML
PDF
Tianyi Zhang, Anshumali Shrivastava
TL;DR
大规模的语言模型通过权重量化技术 LeanQuant 可以有效地减少解码延迟和内存需求,在4位、3位和2位区域与竞争性基准相比表现良好。
Abstract
large language models
(LLMs) have numerous applications across various domains, but their high computational and memory demands pose significant deployment challenges.
weight quantization
is an effective techniqu
→