BriefGPT.xyz
Oct, 2016
开放式视觉问答
Open-Ended Visual Question-Answering
HTML
PDF
Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto
TL;DR
研究使用深度学习框架解决视觉问答任务的方法,探索LSTM网络和VGG-16、K-CNN卷积神经网络提取图像特征,将其与问题的词嵌入或句子嵌入相结合进行答案预测。在Visual Question Answering Challenge 2016中获得了53.62%的准确率。
Abstract
This thesis report studies methods to solve
visual question-answering
(VQA) tasks with a
deep learning
framework. As a preliminary step, we explore Long Short-Term Memory (
→