Due to the over-parameterization of neural networks, many model compression
methods based on pruning and quantization have emerged. They are remarkable in
reducing the size, parameter number, and computational complexity of the model.
However, most of the models compressed by such meth