Diffusion models excel at generating photorealistic images from text-queries. Naturally, many approaches have been proposed to use these generative abilities to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large noisily supervised, but nonetheless, annotated datasets. It is an open question whether the generalization capabilities of diffusion models beyond using the additional data of the pre-training process for augmentation lead to improved downstream performance. We perform a systematic evaluation of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. While we find that personalizing diffusion models towards the target data outperforms simpler prompting strategies, we also show that using the training data of the diffusion model alone, via a simple nearest neighbor retrieval procedure, leads to even stronger downstream performance. Overall, our study probes the limitations of diffusion models for data augmentation but also highlights its potential in generating new training data to improve performance on simple downstream vision tasks.

本文系统评估了扩散模型生成图像的现有方法，并研究了新的扩展方式以评估它们对数据增强的益处。作者发现，将扩散模型个性化到目标数据的方法优于简单的提示策略，但使用扩散模型的训练数据，通过简单的最近邻检索程序，直接提高下游性能。此项研究揭示了扩散模型在数据增强方面的局限性，同时也突显了其在生成新训练数据方面的潜力，以提高在简单的下游视觉任务中的性能。

扩增模型与检索中的数据增强视角