BriefGPT.xyz
Dec, 2020
ALP-KD: 基于注意力的层映射知识蒸馏
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
HTML
PDF
Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu
TL;DR
研究知识蒸馏在神经网络中的应用。提出了一种基于注意力机制的组合技术,通过将教师网络和学生网络的信息进行融合,并且考虑每层的重要性,在中间层进行蒸馏。实验表明,该技术能够优于其他现有的技术。
Abstract
knowledge distillation
is considered as a training and
compression
strategy in which two
neural networks
, namely a teacher and a student,
→