Unlike existing knowledge distillation methods focus on the baseline
settings, where the teacher models and training strategies are not that strong
and competing as state-of-the-art approaches, this paper presents a method
dubbed DIST to distill better from a stronger teacher. We empir