logit knowledge distillation attracts increasing attention due to its
practicality in recent studies. However, it often suffers inferior performance
compared to the feature knowledge distillation. In this paper, we argue that
existing logit-based methods may be sub-optimal since they o