Knowledge distillation involves transferring soft labels from a teacher to a student using a shared temperature-based softmax function. However, the assumption of a shared temperature between teacher and student implies a mandatory exact match between their logits in terms of logit range and variance. This side-effect limits the performance of student, considering the capacity discrepancy between them and the finding that the innate logit relations of teacher are sufficient for student to learn. To address this issue, we propose setting the temperature as the weighted standard deviation of logit and performing a plug-and-play Z-score pre-process of logit standardization before applying softmax and Kullback-Leibler divergence. Our pre-process enables student to focus on essential logit relations from teacher rather than requiring a magnitude match, and can improve the performance of existing logit-based distillation methods. We also show a typical case where the conventional setting of sharing temperature between teacher and student cannot reliably yield the authentic distillation evaluation; nonetheless, this challenge is successfully alleviated by our Z-score. We extensively evaluate our method for various student and teacher models on CIFAR-100 and ImageNet, showing its significant superiority. The vanilla knowledge distillation powered by our pre-process can achieve favorable performance against state-of-the-art methods, and other distillation variants can obtain considerable gain with the assistance of our pre-process.

知识蒸馏通过使用共享的基于温度的软最大函数，从教师向学生传递软标签。然而，教师和学生之间的温度共享假设意味着在logit的范围和方差方面需要强制精确匹配。为了解决这个问题，我们提出将温度设定为logit的加权标准差，并在应用softmax和Kullback-Leibler散度之前进行Z分数预处理标准化。我们的预处理使学生能够关注来自教师的基本logit关系而不需要幅值匹配，并且可以提高现有基于logit的蒸馏方法的性能。我们还展示了一个典型案例，即教师和学生之间传统的温度共享设置不能可靠地产生真实的蒸馏评估; 尽管如此，我们的Z分数成功缓解了这个挑战。我们对CIFAR-100和ImageNet上的各种学生和教师模型进行了广泛评估，展示了其显著优越性。通过我们的预处理，纯知识蒸馏方法能够达到与最先进方法相当的性能，而其他蒸馏变体则可以在我们的预处理辅助下获得相当大的收益。

知识蒸馏中的Logit标准化