We present an empirical study of active learning for Visual Question Answering, where a deep VQA model selects informative question-image pairs from a pool and queries an oracle for answers to maximally improve its performance under a limited query budget. Drawing analogies from human learning, we explore cramming (entropy), curiosity-driven (expected model change), and goal-driven (expected error reduction) active learning approaches, and propose a fast and effective goal-driven active learning scoring function to pick question-image pairs for deep VQA models under the Bayesian Neural Network framework. We find that deep VQA models need large amounts of training data before they can start asking informative questions. But once they do, all three approaches outperform the random selection baseline and achieve significant query savings. For the scenario where the model is allowed to ask generic questions about images but is evaluated only on specific questions (e.g., questions whose answer is either yes or no), our proposed goal-driven scoring function performs the best.

本文是一项关于视觉问答中主动学习的实证研究，聚焦于使用深度VQA模型从一个池中选择有信息量的问题-图像对，通过与贝叶斯神经网络框架下快速有效的目标驱动主动学习评分函数，最大限度地提高性能，针对三种不同的主动学习方法进行研究评估，结果表明，本文提出的目标驱动评分函数表现最佳。

视觉问答中的主动学习：实证研究