In this paper, we aim to obtain improved attention for a visual question
answering (VQA) task. It is challenging to provide supervision for attention.
An observation we make is that visual explanations as obtained through class
activation mappings (specifically Grad-CAM) that are meant