BriefGPT.xyz
Jul, 2022
视觉语言Transformer中基于弱监督的VQA grounding
Weakly Supervised Grounding for VQA in Vision-Language Transformers
HTML
PDF
Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels Da Vitoria Lobo, Mubarak Shah
TL;DR
该论文提出了一个基于Transformer的弱监督环境下的视觉问答定位方法,该方法通过将每个视觉令牌分组并利用自注意力来遮蔽那些胶囊,从而改善了检测物品能力限制的问题。经过实验证明,该方法提出了新的最先进结果。
Abstract
transformers
for
visual-language representation learning
have been getting a lot of interest and shown tremendous performance on
visual question
→