Deep learning has achieved impressive results in many areas of Computer Vision and Natural Language Pro- cessing. Among others, visual question answering (VQA), also referred to a visual Turing test, is considered one of the most compelling problems, and recent deep learning models hav