BriefGPT.xyz
Mar, 2024
基于模型和数据的视觉定位学习
Learning from Models and Data for Visual Grounding
HTML
PDF
Ruozhen He, Paola Cascante-Bonilla, Ziyan Yang, Alexander C. Berg, Vicente Ordonez
TL;DR
SynGround是一个结合数据驱动学习和知识传递的新框架,通过模型间的知识传递增强预训练的视觉语言模型的视觉定位能力,并通过合成图像和文本来提高模型性能,最终在多个数据集上展示出提升。
Abstract
We introduce
synground
, a novel framework that combines data-driven learning and knowledge transfer from various large-scale
pretrained models
to enhance the
→