BriefGPT.xyz
Mar, 2022
知识蒸馏:不好的模型也可以成为好的榜样
Knowledge Distillation: Bad Models Can Be Good Role Models
HTML
PDF
Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz
TL;DR
证明了在条件抽样器中,从过度参数化的状态下训练出的大型神经网络可以产生学生网络逼近贝叶斯最优分类器,将某些常见的学习算法(如最近邻居和核机器)应用于过度参数化的状态时会生成条件抽样器。
Abstract
Large
neural networks
trained in the
overparameterized regime
are able to fit noise to zero train error. Recent work \citep{nakkiran2020distributional} has empirically observed that such networks behave as "
→