Applying knowledge distillation encourages a student model to behave more like a teacher model, largely retaining the performance of the teacher model, even though the student model may have substantially fewer parameters. However, while distillation helps student models behave more like teacher models in-distribution, this is not necessarily the case out-of-distribution. To address this, we use a language model to create task-specific unlabeled data that mimics the data in targeted out-of-distribution domains. We use this generated data for knowledge distillation on the task of Natural Language Inference (NLI), encouraging the student models to behave more like the teacher models for these examples. Our domain-targeted augmentation is highly effective, and outperforms previous robustness methods when evaluating out-of-distribution performance on MNLI. Surprisingly, this method also improves performance on out-of-distribution domains that the data was not generated for. We additionally introduce Distilled Minority Upsampling (DMU), a method for identifying and upsampling minority examples during the distillation. DMU is complementary to the domain-targeted augmentation, and substantially improves performance on SNLI-hard. Finally, we show out-of-distribution improvements on HANS from both of our methods, despite augmenting the training data with fewer than 5k examples.

本文介绍了一种基于知识蒸馏的方法和使用语言模型创建任务特定无标签数据增强来提高模型在NLI任务中的性能，在out-of-distribution方面的表现超过以前的方法。通过Distilled Minority Upsampling来鲁棒性提高了SNLI-hard的表现。

使用领域针对性数据增强来提升知识蒸馏的鲁棒性