knowledge distillation (KD) is a powerful technique for transferring
knowledge between neural network models, where a pre-trained teacher model is
used to facilitate the training of the target student model. However, the
availability of a suitable teacher model is not always guaranteed