准备教材: 利用更好的监督改进知识蒸馏

Nov, 2019

准备教材: 利用更好的监督改进知识蒸馏

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Tiancheng Wen, Shenqi Lai, Xueming Qian

TL;DR本研究提出了两种新颖的方法，知识调整（KA）和动态温度蒸馏（DTD），用于惩罚错误监督并改善学生模型，实验表明该方法在各种评测数据集上，以及与其他基于知识蒸馏的方法相结合时，都能获得鼓舞人心的表现。

Abstract

knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: