Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches in the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the essential sampling process. All discussions are centered around popular applications. Finally, we pinpoint some critical yet still open problems to be solved in the future and suggest some possible solutions. Our reviewed works are itemized at https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models.

本研究针对条件图像合成的复杂性与快速发展的挑战，系统性地分类现有文献，探讨条件如何融入扩散模型的去噪网络和采样过程。研究的关键在于分析各种条件方法在训练及专业化阶段的原理和优缺点，并总结六种主流条件机制。该综述不仅为研究者提供了深入的理解，也指出了未来亟待解决的关键问题和潜在的解决方案。

基于扩散模型的条件图像合成：综述