BriefGPT.xyz
Sep, 2023
MixQuant:混合精度量化与位宽优化搜索
MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
HTML
PDF
Eliska Kloberdanz, Wei Le
TL;DR
量化是一种用于创建高效深度神经网络的技术,可以通过以低于32位浮点精度的比特宽度执行计算和存储张量来减小模型大小和推理延迟,但量化可能导致舍入误差引起的数值不稳定性,降低量化模型的准确性,而MixQuant则是一种搜索算法,根据舍入误差为每个层权重找到最佳的自定义量化比特宽度。
Abstract
quantization
is a technique for creating efficient
deep neural networks
(DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision.
→