BriefGPT.xyz
Jun, 2024
LLM的低秩量化感知训练
Low-Rank Quantization-Aware Training for LLMs
HTML
PDF
Yelysei Bondarenko, Riccardo Del Chiaro, Markus Nagel
TL;DR
大型语言模型经常遇到计算和存储需求增加的挑战,为此我们提出了一种名为LR-QAT的轻量级、存储高效的量化感知训练算法,通过使用低秩辅助权重、固定点或双包整数的强制转换运算符以及检查点等组件,我们可以在不牺牲预测性能的情况下节省内存,该方法可应用于多种量化设置并与多种PTQ技术无缝结合,有效提升模型性能并在内存使用上达到与全模型QAT相当的水平。
Abstract
large language models
(LLMs) are omnipresent, however their practical deployment is challenging due to their ever increasing computational and memory demands.
quantization
is one of the most effective ways to mak
→