large language models(LLMs) containing tens of billions of parameters (or
even more) have demonstrated impressive capabilities in various NLP tasks.
However, substantial model size poses challenges to training, inference, and
deployment so that it is necessary to compress the model. At