knowledge distillation(KD) aims to improve the performance of a student
network by mimicing the knowledge from a powerful teacher network. Existing
methods focus on studying what knowledge should be transferred and treat all
samples equally during training. This paper introduces the ad