Recently, the number of parameters in DNNs has explosively increased, as exemplified by LLMs (Large Language Models), making inference on small-scale computers more difficult. Model compression technology is, therefore, essential for integration into products. In this paper, we propose a method of quantization-aware training. We introduce a novel normalization (Layer-Batch Normalization) that is independent of the mini-batch size and does not require any additional computation cost during inference. Then, we quantize the weights by the scaled round-clip function with the weight standardization. We also quantize activation functions using the same function and apply surrogate gradients to train the model with both quantized weights and the quantized activation functions. We call this method Magic for the age of Quantised DNNs (MaQD). Experimental results show that our quantization method can be achieved with minimal accuracy degradation.

本文提出了一种量化感知训练的方法，通过引入一种独立于小批量大小的新型规范化（Layer-Batch Normalization）和标准化权重的缩放环夹函数对权重进行量化，同时对激活函数使用同样的函数进行量化，并应用替代梯度来训练模型，实验证明我们的量化方法可以在最小的准确性降低下实现。

量化DNN时代的魔法