scene synthesis is a challenging problem with several industrial
applications. Recently, substantial efforts have been directed to synthesize
the scene using human motions, room layouts, or spatial graphs as the input.
However, few studies have addressed this problem from multiple moda