Asking questions about visual environments is a crucial way for intelligent agents to understand rich multi-faceted scenes, raising the importance of visual question generation (VQG) systems. Apart from being grounded to the image, existing VQG systems can use textual →