Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated impressive success in generating high-quality images with zero-shot generalization capabilities. Yet, current models struggle to closely adhere to prompt semantics, often misrepresenting or overlooking specific attributes. To address this, we propose a simple, training-free approach that modulates the guidance direction of diffusion models during inference. We first decompose the prompt semantics into a set of concepts, and monitor the guidance trajectory in relation to each concept. Our key observation is that deviations in model's adherence to prompt semantics are highly correlated with divergence of the guidance from one or more of these concepts. Based on this observation, we devise a technique to steer the guidance direction towards any concept from which the model diverges. Extensive experimentation validates that our method improves the semantic alignment of images generated by diffusion models in response to prompts. Project page is available at: https://korguy.github.io/

最近的文本到图像(T2I)扩散模型的进展在生成具有零样本泛化能力的高质量图像方面取得了令人印象深刻的成功。然而，当前的模型在紧密遵循提示语义方面存在困难，通常会误代或忽视特定属性。为了解决这个问题，我们提出了一种简单的、无需训练的方法，在推理过程中调节扩散模型的引导方向。我们首先将提示语义分解为一组概念，并监控与每个概念相关的引导轨迹。我们的关键观察是，模型在遵循提示语义方面的偏离与引导从一个或多个概念偏离的差异高度相关。基于这一观察，我们设计了一种技术，将引导方向引导至模型偏离的任何概念。广泛的实验验证了我们的方法可以改善扩散模型对提示的语义对齐。项目页面可在此链接上找到: this https URL

文本到图像扩散模型的语义引导调整