Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Yaowei Wang...
TL;DR该研究提出利用量化器中的可学习参数作为量化精度重要指标,通过一次整数线性规划来确定混合精度量化的最佳位宽以提高时间效率,并在多种模型中实现了 SOTA 精度。
Abstract
The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which consume hundreds or even thousands of GPU-hours. In