BriefGPT.xyz
Apr, 2022
使用遮蔽图像建模发挥纯Transformer视觉模型在目标检测中的作用
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
HTML
PDF
Yuxin Fang, Shusheng Yang, Shijie Wang, Yixiao Ge, Ying Shan...
TL;DR
本文提出了MIMDet检测器,采用预处理的ViT编码器作为检测器基础,通过嵌入卷积中间特征构建多尺度表示,最终结果比采用较为保守微调的ViT检测器在COCO上优于2.5个盒子AP和2.6个掩码AP,并且收敛速度更快。
Abstract
We present an approach to efficiently and effectively adapt a
masked image modeling
(MIM) pre-trained vanilla
vision transformer
(ViT) for
object
→