Diffusion models, such as Stable Diffusion (SD), offer the ability to generate high-resolution images with diverse features, but they come at a significant computational and memory cost. In classifier-free guided diffusion models, prolonged inference times are attributed to the necessity of computing two separate diffusion models at each denoising step. Recent work has shown promise in improving inference time through distillation techniques, teaching the model to perform similar denoising steps with reduced computations. However, the application of distillation introduces additional memory overhead to these already resource-intensive diffusion models, making it less practical. To address these challenges, our research explores a novel approach that combines Low-Rank Adaptation (LoRA) with model distillation to efficiently compress diffusion models. This approach not only reduces inference time but also mitigates memory overhead, and notably decreases memory consumption even before applying distillation. The results are remarkable, featuring a significant reduction in inference time due to the distillation process and a substantial 50% reduction in memory consumption. Our examination of the generated images underscores that the incorporation of LoRA-enhanced distillation maintains image quality and alignment with the provided prompts. In summary, while conventional distillation tends to increase memory consumption, LoRA-enhanced distillation offers optimization without any trade-offs or compromises in quality.

我们的研究探索了一种新颖的方法，将低秩适应性（LoRA）与模型蒸馏相结合，以有效地压缩扩散模型。该方法不仅减少了推理时间，还减轻了内存开销，甚至在应用蒸馏之前就显著降低了内存消耗。结果是显著减少了由蒸馏过程导致的推理时间，并且内存消耗减少了50%。生成图像的检查强调了LoRA增强蒸馏与所提供提示的图像质量和对齐性一致。总之，传统的蒸馏倾向于增加内存消耗，而LoRA增强蒸馏则在质量方面没有任何妥协，提供了优化的解决方案。

基于LoRA增强的导向扩散模型蒸馏