We study the problem of dataset distillation - creating a small set of synthetic examples capable of training a good model. In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation. Interestingly, label distillation can be applied across datasets, for example enabling learning Japanese character recognition by training only on synthetically labeled English letters. Methodologically, we introduce a more robust and flexible meta-learning algorithm for distillation, as well as an effective first-order strategy based on convex optimization layers. Distilling labels with our new algorithm leads to improved results over prior image-based distillation. More importantly, it leads to clear improvements in flexibility of the distilled dataset in terms of compatibility with off-the-shelf optimizers and diverse neural architectures.

针对数据集蒸馏的问题，我们提出了用合成标签来训练模型，比基于图像的方法更为有效；我们引入了更加鲁棒和灵活的元学习算法以及一种基于凸优化层的一阶策略，这种新算法可以提高模型的性能，并且可兼容各个优化器及不同的神经结构。我们的研究发现，标签蒸馏还能夸数据集应用，例如只通过合成标签的英文字母来训练以学习日文字母识别。

数据集蒸馏: 学习标签而非图像