In this paper we approach the novel problem of segmenting an image based on a
natural language expression. This is different from traditional semantic
segmentation over a predefined set of semantic classes, as e.g., the phrase
"two men sitting on the right bench" requires segmenting only the two people on
the right bench and no one standing or sitting on ano