BriefGPT.xyz
Aug, 2023
语言引导扩散模型用于视觉定位
Language-Guided Diffusion Model for Visual Grounding
HTML
PDF
Sijia Chen, Baochun Li
TL;DR
通过去噪扩散建模的语言引导扩散框架(LG-DVG)提出了一种逐步推理的视觉定位方法,可持续改进查询-区域匹配,在跨模态对齐任务中以生成方式解决视觉定位,并在多个数据集上验证其超凡性能。
Abstract
visual grounding
(VG) tasks involve explicit
cross-modal alignment
, as semantically corresponding image regions are to be located for the language phrases provided. Existing approaches complete such visual-text r
→