Solving grounded language tasks often requires reasoning about relationships
between objects in the context of a given task. For example, to answer the
question "What color is the mug on the plate?" we must check the color of the
specific mug that satisfies the "on" relationship with respect to the plate.
Recent work has proposed various methods capable of c