BriefGPT.xyz
Feb, 2024
BiLLM: 提高LLM后训练量化的极限
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
HTML
PDF
Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang...
TL;DR
BiLLM是一种创新的1位后训练量化方案,定制了预训练的大型语言模型,实现了仅使用1.08位权重在各种LLM家族和评估指标上实现高准确度的推理,超过了LLM的SOTA量化方法。此外,BiLLM能够在单个GPU上在0.5小时内实现对拥有70亿权重的LLM的二值化过程,显示了令人满意的时间效率。
Abstract
pretrained large language models
(LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful
compression
technology,
→