Compared to the moderate size of neural network models, structural weight pruning on the large-language models (LLMs) imposes a novel challenge on the efficiency of the pruning algorithms, due to the heavy computation/memory demands of the LLMs. Recent efficient LLM pruning methods typ