AbstractDiffusion-driven text-to-image (T2I) generation has achieved remarkable advancements. To further improve T2I models' capability in
numerical and spatial reasoning, the layout is employed as an intermedium to bridge large language models and
→