大型语言模型压缩基础 - 第1部分：权重量化

Sep, 2024

大型语言模型压缩基础 - 第1部分：权重量化

Foundations of Large Language Model Compression -- Part 1: Weight Quantization

Sean I. Young

TL;DR本研究解决了大型语言模型在资源受限设备上部署和降低计算成本的问题，提出了一种基于凸优化的权重量化方法CVXQ，超越了以往的技术。研究显著发现，该方法能够灵活地将模型压缩到任何指定大小，并适用于包含数千亿权重参数的模型。

Abstract

In recent years, compression of Large Language Models (LLMs) has emerged as an important problem to allow language model deployment on resource-constrained devices, reduce computational costs, and mitigate the environmental footprint of large-scale AI infrastructure. In this paper, we