开放式视觉问答

Oct, 2016

Open-Ended Visual Question-Answering

Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto

TL;DR研究使用深度学习框架解决视觉问答任务的方法，探索LSTM网络和VGG-16、K-CNN卷积神经网络提取图像特征，将其与问题的词嵌入或句子嵌入相结合进行答案预测。在Visual Question Answering Challenge 2016中获得了53.62％的准确率。

Abstract

This thesis report studies methods to solve visual question-answering (VQA) tasks with a deep learning framework. As a preliminary step, we explore Long Short-Term Memory (→