Humans have the innate capability to answer diverse questions, which is
rooted in the natural ability to correlate different concepts based on their
semantic relationships and decompose difficult problems into sub-tasks. On the
contrary, existing visual reasoning methods assume trainin