visual question answering (VQA) has traditionally been treated as a
single-step task where each question receives the same amount of effort, unlike
natural human question-answering strategies. We explore a question
decomposition strategy for VQA to overcome this limitation. We probe th