scene graph generation aims to identify objects and their relations in
images, providing structured image representations that can facilitate numerous
applications in computer vision. However, scene graph models usually require
supervised learning on large quantities of labeled data wi