BriefGPT.xyz
Dec, 2020
简单并不容易:文本VQA和TextCaps的简单强基准
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps
HTML
PDF
Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu
TL;DR
本篇论文提出了一种简单的关注机制,通过将OCR令牌特征分别发送到可视化和语言关注分支,并将它们发送到流行的Transformer解码器以生成答案或标题,从而在TextVQA和ST-VQA等多个基准测试上取得最新的最佳表现,并且在文本图像字幕方面超过了TextCaps挑战2020的获胜者
Abstract
Texts appearing in daily scenes that can be recognized by
ocr
(Optical Character Recognition) tools contain significant information, such as street name, product brand and prices. Two tasks --
text-based visual question
→