BriefGPT.xyz
Oct, 2024
SLiM:一次性量化稀疏加低秩近似的大型语言模型
SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs
HTML
PDF
Mohammad Mozaffari, Maryam Mehri Dehnavi
TL;DR
本研究针对大型语言模型(LLMs)高内存消耗和慢推理速度的问题,提出了一种名为SLiM的新型压缩方法。SLiM通过结合对称量化和基于显著性的低秩近似,采用一次性处理方式消除了繁琐的重训练过程,显著提高了模型精度,展示了在内存受限环境中高效部署大型模型的潜力。
Abstract
Large Language Models
(LLMs) have revolutionized natural language understanding and generation tasks but suffer from high memory consumption and slow inference times due to their large parameter sizes. Traditional
Model
→