Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing logit-based methods may be sub-optimal since they only leverage the global logit output that couples multiple semantic knowledge. This may transfer ambiguous knowledge to the student and mislead its learning. To this end, we propose a simple but effective method, i.e., Scale Decoupled Distillation (SDD), for logit knowledge distillation. SDD decouples the global logit output into multiple local logit outputs and establishes distillation pipelines for them. This helps the student to mine and inherit fine-grained and unambiguous logit knowledge. Moreover, the decoupled knowledge can be further divided into consistent and complementary logit knowledge that transfers the semantic information and sample ambiguity, respectively. By increasing the weight of complementary parts, SDD can guide the student to focus more on ambiguous samples, improving its discrimination ability. Extensive experiments on several benchmark datasets demonstrate the effectiveness of SDD for wide teacher-student pairs, especially in the fine-grained classification task. Code is available at: https://github.com/shicaiwei123/SDD-CVPR2024

本文主要介绍了一种新的逻辑知识蒸馏方法，即基于比例分离的蒸馏方法（SDD），通过将全局逻辑输出解耦成多个局部逻辑输出，并建立相应的蒸馏管道，帮助学生模型挖掘和继承细粒度和明确的逻辑知识，从而提高其识别能力。这种方法尤其在细粒度分类任务中展现了出色的效果。

尺度解耦蒸馏