Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller student network, and there are two important design choices for this ensemble distillation: 1) how to construct the student network, and 2) what data should be shown during training. In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student. Combining these two, our method significantly improves upon previous methods on various image classification tasks.

该论文介绍了一种基于权重平均技术和扰动策略的集成神经网络蒸馏方法，有效地将多个教师网络的功能多样性吸收到一个适合资源受限环境中使用的学生网络中，并在多个图像分类任务上显著提高了性能。

利用加权平均和多样化扰动改进集成蒸馏