BriefGPT.xyz
Dec, 2021
去混淆的视觉定位
Deconfounded Visual Grounding
HTML
PDF
Jianqiang Huang, Yu Qin, Jiaxin Qi, Qianru Sun, Hanwang Zhang
TL;DR
通过构建因果图,打破了视觉定位过程中的语言-位置混杂偏差,提出了去混杂视觉定位的新方法Referring Expression Deconfounder(RED),并且在各类基准测试中取得了显著的提升。
Abstract
We focus on the
confounding bias
between language and location in the
visual grounding
pipeline, where we find that the bias is the major visual reasoning bottleneck. For example, the grounding process is usually
→