BriefGPT.xyz
Dec, 2023
通过自洽解释改进的视觉对准
Improved Visual Grounding through Self-Consistent Explanations
HTML
PDF
Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez
TL;DR
使用视觉与语言模型、视觉解释方法和近义词进行微调,目标是提高定位能力和对象高亮质量。在多个数据集中,通过该方法相较于基线方法和之前的工作获得了显著的改进。
Abstract
vision-and-language models
trained to match images with text can be combined with
visual explanation
methods to point to the locations of specific objects in an image. Our work shows that the localization --"grou
→