Generative diffusion models provide strong priors for text-to-image generation and thereby serve as a foundation for conditional generation tasks such as image editing, restoration, and super-resolution. However, one major limitation of diffusion models is their slow sampling time. To address this challenge, we present a novel conditional distillation method designed to supplement the diffusion priors with the help of image conditions, allowing for conditional sampling with very few steps. We directly distill the unconditional pre-training in a single stage through joint-learning, largely simplifying the previous two-stage procedures that involve both distillation and conditional finetuning separately. Furthermore, our method enables a new parameter-efficient distillation mechanism that distills each task with only a small number of additional parameters combined with the shared frozen unconditional backbone. Experiments across multiple tasks including super-resolution, image editing, and depth-to-image generation demonstrate that our method outperforms existing distillation techniques for the same sampling time. Notably, our method is the first distillation strategy that can match the performance of the much slower fine-tuned conditional diffusion models.

通过图像条件实现的一种新颖的有条件蒸馏方法，将扩散模型的先验知识与图像条件相结合，大大简化了以往两阶段的蒸馏过程，并通过少量的额外参数和冻结的无条件主干网络实现了一种新的高效蒸馏机制，实验结果表明，该方法在多个任务上的表现优于现有的人工蒸馏技术，并且是第一个能够与更慢的精细调优有条件扩散模型相匹配的蒸馏策略。

条件扩散蒸馏