When training neural networks with simulated quantization, we observe that quantized weights can, rather unexpectedly, oscillate between two grid-points. The importance of this effect and its impact on quantization-aware training are not well-understood or investigated in literature. In this paper, we delve deeper into the phenomenon of weight oscillations and show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics during inference and increased noise during training. These effects are particularly pronounced in low-bit ($\leq$ 4-bits) quantization of efficient networks with depth-wise separable layers, such as MobileNets and EfficientNets. In our analysis we investigate several previously proposed quantization-aware training (QAT) algorithms and show that most of these are unable to overcome oscillations. Finally, we propose two novel QAT algorithms to overcome oscillations during training: oscillation dampening and iterative weight freezing. We demonstrate that our algorithms achieve state-of-the-art accuracy for low-bit (3 & 4 bits) weight and activation quantization of efficient architectures, such as MobileNetV2, MobileNetV3, and EfficentNet-lite on ImageNet.

本文研究神经网络的量化问题，发现在低比特率下，深度可分离网络（如MobileNets,EfficientNets）量化训练中，量化权重可能出现意外震荡，导致在推断过程中统计错误、在训练过程中增加噪声，进而显著降低准确性。作者提出了两种新的QAT算法，分别是自适应调节震荡和迭代冻结权重，相较已有算法都表现出了更好的效果。

量化感知训练中克服振荡问题