BriefGPT.xyz
Aug, 2024
LLaMA3-70B的独特性与每通道量化:一项实证研究
The Uniqueness of LLaMA3-70B with Per-Channel Quantization: An Empirical Study
HTML
PDF
Minghai Qin
TL;DR
本研究解决了LLaMA3-70B模型在使用8位整数权重和8位整数激活(W8A8)后训练量化时,独特的准确度下降行为这一问题。我们提出了一种混合策略,通过对少于3%的层应用精细的W8A8量化,显著提升LLaMA3-70B模型在推理任务中的表现,准确度从45.5%提高至73.4%。这一发现为大语言模型的高效部署提供了新思路。
Abstract
We have observed a distinctive
Quantization
-related behavior in the LLaMA3/3.1-70B models that is absent in both the LLaMA2-70B and LLaMA3/3.1-8B/405B models.
Quantization
is a crucial technique for deploying lar
→