知识蒸馏的有效性

Oct, 2019

On the Efficacy of Knowledge Distillation

Jang Hyun Cho, Bharath Hariharan

TL;DR本研究评估了知识蒸馏的有效性以及它对学生和教师体系的依赖性。发现较精确的教师并不一定是好教师，且大型模型并不总是更好的教师，这导致了容器不匹配的问题，本研究表明教师的训练中止可以缓解这种效应，这些结果适用于各种数据集和模型。

Abstract

In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures. Starting with the observation that more accurate teachers often don't make good teachers, we attempt to tease apart the factors that affect