dataset distillation aims to compress a dataset into a much smaller one so
that a model trained on the distilled dataset achieves high accuracy. Current
methods frame this as maximizing the distilled classification accuracy for a
budget of K distilled images-per-class, where K is a pos