We introduce the task of image-set visual question answering (ISVQA), which
generalizes the commonly studied single-image VQA problem to multi-image
settings. Taking a natural language question and a set of images as input, it
aims to answer the question based on the content of the ima