knowledge distillation~(KD) aims to craft a compact student model that
imitates the behavior of a pre-trained teacher in a target domain. Prior KD
approaches, despite their gratifying results, have largely relied on the
premise that \emph{in-domain} data is available to carry out the k