BriefGPT.xyz
Apr, 2019
面向能够阅读的VQA模型
Towards VQA Models that can Read
HTML
PDF
Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen...
TL;DR
我们研究了盲人用户在看不到图片的情况下经常关注的图像内容,即图片中的文本,并介绍了一种名为LoRRA的新模型用于解决这个问题,同时提出了一个名为TextVQA的数据集来评估和改进模型表现。
Abstract
Studies have shown that a dominant class of questions asked by
visually impaired users
on images of their surroundings involves
reading text
in the image. But today's VQA models can not read! Our paper takes a fi
→