There has been a rapid progress in the task of visual question answering with
improved model architectures. Unfortunately, these models are usually
computationally intensive due to their sheer size which poses a serious
challenge for deployment. We aim to tackle this issue for the spec