BriefGPT.xyz
Jun, 2022
重新思考标签平滑和知识蒸馏的兼容性: 缺失了什么?
Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?
HTML
PDF
Keshigeyan Chandrasegaran, Ngoc-Trung Tran, Yunqing Zhao, Ngai-Man Cheung
TL;DR
该论文通过大量实验、分析和案例研究,发现并验证了系统性扩散是理解和解决标签平滑和知识蒸馏之间互相矛盾的发现的关键概念,从而推荐使用经过标签平滑的教师模型以及低温度传输来获得高性能的学生模型。
Abstract
This work investigates the compatibility between
label smoothing
(LS) and
knowledge distillation
(KD). Contemporary findings addressing this thesis statement take dichotomous standpoints: Muller et al. (2019) and
→