Ben Zandonati, Adrian Alan Pol, Maurizio Pierini, Olya Sirkin, Tal Kopetz
TL;DR本文提出了一种利用 FIT 方法对深度学习模型进行量化计算的方法,该方法结合了 Fisher 信息和量化模型,能够有效地估计网络的最终性能,并且可用于不同层级和混合精度的量化配置,以提高模型的压缩效率。
Abstract
model compression is vital to the deployment of deep learning on edge
devices. Low precision representations, achieved via quantization of weights
and activations, can reduce inference time and memory requirement