One of the most difficult tasks in scene understanding is recognizing
interactions between objects in an image. This task is often called visual
relationship detection (VRD). We consider the question of whether, given
auxiliary textual data in addition to the standard visual data used