BriefGPT.xyz
Feb, 2021
VS-Quant:基于向量缩放的低精度神经网络推断量化
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
HTML
PDF
Steve Dai, Rangharajan Venkatesan, Haoxing Ren, Brian Zimmer, William J. Dally...
TL;DR
本研究提出每个张量维度内小向量的单独比例因子缩放技术以降低量化相关的精度损失,有效提高了卷积神经网络的推理准确性并在深度学习加速器硬件设计上实现了硬件效率的提高和能源消耗的降低。
Abstract
quantization
enables efficient acceleration of deep neural networks by reducing model memory footprint and exploiting low-cost integer math hardware units.
quantization
maps floating-point weights and activations
→