BriefGPT.xyz
Oct, 2024
大规模软标签对于大规模数据集蒸馏是否必要?
Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation?
HTML
PDF
Lingao Xiao, Yang He
TL;DR
本研究探讨了大规模软标签在大规模数据集蒸馏中的必要性,重点解决了压缩数据集时类内相似性过高的问题。通过在图像合成过程中引入类级监督,显著提高了类内多样性,从而减少了软标签的需求。研究表明,采用这种方法时,所需软标签的大小可以从113 GB压缩到2.8 GB,同时性能提升了2.6%。
Abstract
In
ImageNet
-condensation, the storage for auxiliary
soft labels
exceeds that of the condensed dataset by over 30 times. However, are large-scale
→