We present a framework that formulates visual question answering as modular
code generation. In contrast to prior work on modular approaches to VQA, our
approach requires no additional training and relies on pre-trained language
models (LMs), visual models pre-trained on image-caption