Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. In this paper, we present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their overfitting to the neural network involved in distilling. This enhances the generalization capability of the distilled images across various neural network architectures and also increases their robustness to noise. Extensive experiments demonstrate that our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1\% on Tiny-ImageNet, 9.0\% on ImageNet-1K, and 7.3\% on ImageNet-21K. The source code will be released to the community.

本论文提出了一个基于课程设置的数据集蒸馏框架，通过将从简单到复杂的课程进行逐步蒸馏，合理处理合成图像的生成和评估，进一步利用对抗优化改进图像的代表性，提高其在不同神经网络体系结构下的泛化能力和抗噪性，实现了大规模数据集蒸馏的新突破。

课程数据蒸馏