With the development of deep learning, advanced dialogue generation methods
usually require a greater amount of computational resources. One promising
approach to obtaining a high-performance and lightweight model is knowledge
distillation, which relies heavily on the pre-trained power