BriefGPT.xyz
Apr, 2022
通过视觉语言验证和迭代推理来改善视觉定位
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
HTML
PDF
Li Yang, Yan Xu, Chunfeng Yuan, Wei Liu, Bing Li...
TL;DR
本研究提出了一种基于transformer的视觉定位框架,通过建立文本条件的区分性特征和执行多阶段跨模态推理来实现精确的视觉定位,并提出了基于文本的视觉上下文信息编码器和多阶段解码器以实现最新的性能。
Abstract
visual grounding
is a task to locate the target indicated by a natural language expression. Existing methods extend the generic object detection framework to this problem. They base the
visual grounding
on the fe
→