transformers gain popularity because of their superior prediction accuracy and inference throughput. However, the transformer is computation-intensive, causing a long inference time. The existing work to accelerate transformer inferences has limitations because of the changes to transf