BriefGPT.xyz
Nov, 2020
HAWQV3:二元神经网络量化
HAWQV3: Dyadic Neural Network Quantization
HTML
PDF
Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu...
TL;DR
HAWQV3 提出了一种新型的混合精度整数量化框架,通过纯整数运算、硬件感知混合精度量化和直接硬件部署方法,实现了模型压缩和量化加速,其中 INT8 量化的准确率比之前的整数方法提高了2.68%,同时混合精度的 INT4/8 量化可以将 INT8 的延迟降低23%且仍能保持76.73%的准确率。
Abstract
quantization
is one of the key techniques used to make Neural Networks (NNs) faster and more energy efficient. However, current low precision
quantization
algorithms often have the hidden cost of conversion back
→