BriefGPT.xyz
Jan, 2020
寻找更好的学生学习精炼知识
Search for Better Students to Learn Distilled Knowledge
HTML
PDF
Jindong Gu, Volker Tresp
TL;DR
本文提出使用L1范数优化从教师网络选取子图作为学生来自动搜索最优学生架构进行知识蒸馏,并在CIFAR数据集上验证,实验表明相比手动指定学生结构,学习得到的学生模型性能更好,并且对该学生模型进行可视化和理解。
Abstract
knowledge distillation
, as a
model compression
technique, has received great attention. The knowledge of a well-performed teacher is distilled to a student with a small architecture. The architecture of the small
→