Recent advancements in diffusion models have showcased their impressive
capacity to generate visually striking images. Nevertheless, ensuring a close
match between the generated image and the given prompt remains a persistent
challenge. In this work, we identify that a crucial factor l