Recent approaches have achieved great success in image generation from
structured inputs, e.g., semantic segmentation, scene graph or layout. Although
these methods allow specification of objects and their locations at
image-level, they lack the fidelity and semantic control to specify