BriefGPT.xyz
Jun, 2021
知识蒸馏真的有效么?
Does Knowledge Distillation Really Work?
HTML
PDF
Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson
TL;DR
研究表明,尽管知识蒸馏有助于学生网络提高推理能力,但通常情况下并不能完全符合教师模型的预测分布,而这往往是由于优化困难所导致的。此外,数据集的细节也影响着知识蒸馏的效果,更符合教师的结果不一定会带来更好的推理能力。
Abstract
knowledge distillation
is a popular technique for training a small
student network
to emulate a larger
teacher model
, such as an ensemble
→