Logit-based knowledge distillation (KD) for classification is cost-efficient compared to feature-based KD but often subject to inferior performance. Recently, it was shown that the performance of logit-based KD can be improved by effectively delivering the probability distribution for the non-target classes from the teacher model, which is known as `implicit (dark) knowledge', to the student model. Through gradient analysis, we first show that this actually has an effect of adaptively controlling the learning of implicit knowledge. Then, we propose a new loss that enables the student to learn explicit knowledge (i.e., the teacher's confidence about the target class) along with implicit knowledge in an adaptive manner. Furthermore, we propose to separate the classification and distillation tasks for effective distillation and inter-class relationship modeling. Experimental results demonstrate that the proposed method, called adaptive explicit knowledge transfer (AEKT) method, achieves improved performance compared to the state-of-the-art KD methods on the CIFAR-100 and ImageNet datasets.

本研究解决了基于逻辑的知识蒸馏在分类中性能较低的问题。通过引入一种新损失函数，使学生模型能够自适应地学习显式知识和隐式知识。此外，研究还提出了将分类和蒸馏任务分离的方法。实验结果表明，所提出的自适应显式知识转移方法在CIFAR-100和ImageNet数据集上超越了现有的先进蒸馏方法。

自适应显式知识转移用于知识蒸馏