Recent years have witnessed some exciting developments in the domain of
generating images from scene-based text descriptions. These approaches have
primarily focused on generating images from a static text description and are
limited to generating images in a single pass. They are unable to generate an
image interactively based on an incrementally additive t