BriefGPT.xyz
May, 2023
区域感知预训练与视觉Transformer实现开放式目标检测
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
HTML
PDF
Dahun Kim, Anelia Angelova, Weicheng Kuo
TL;DR
提出了一种区域感知的开放词汇视觉Transformer(RO-ViT)预训练方法,其中使用区域级别的位置嵌入来代替整个图像位置嵌入,取得了在LVIS和COCO开放词汇检测基准测试的最佳效果。
Abstract
We present
region-aware open-vocabulary vision transformers
(RO-ViT) - a
contrastive image-text pretraining
recipe to bridge the gap between image-level pretraining and
→