knowledge distillation (KD) has shown great potential for transferring
knowledge from a complex teacher model to a simple student model in which the
heavy learning task can be accomplished efficiently and without losing too much
prediction accuracy. Recently, many attempts have been ma