Diffusion-driven text-to-image (T2I) generation has achieved remarkable
advancements. To further improve T2I models' capability in numerical and
spatial reasoning, the layout is employed as an intermedium to bridge large
language models and layout-based diffusion models. However, these