BriefGPT.xyz
Nov, 2019
准备教材: 利用更好的监督改进知识蒸馏
Preparing Lessons: Improve Knowledge Distillation with Better Supervision
HTML
PDF
Tiancheng Wen, Shenqi Lai, Xueming Qian
TL;DR
本研究提出了两种新颖的方法,知识调整(KA)和动态温度蒸馏(DTD),用于惩罚错误监督并改善学生模型,实验表明该方法在各种评测数据集上,以及与其他基于知识蒸馏的方法相结合时,都能获得鼓舞人心的表现。
Abstract
knowledge distillation
(KD) is widely used for training a compact model with the
supervision
of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects:
→