knowledge-based visual question answering (VQA) involves answering questions
that require external knowledge not present in the image. Existing methods
first retrieve knowledge from external resources, then reason over the selected
knowledge, the input image, and question for answer pr