Localizing phrases in images is an important part of image understanding and
can be useful in many applications that require mappings between textual and
visual information. Existing work attempts to learn these mappings from
examples of phrase-image region correspondences (strong supe