BriefGPT.xyz
Feb, 2023
Mask3D:通过学习掩码的3D先验知识预训练2D视觉Transformer
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
HTML
PDF
Ji Hou, Xiaoliang Dai, Zijian He, Angela Dai, Matthias Nießner
TL;DR
提出了一种名为Mask3D的预训练方法,可以将现有的大规模RGB-D数据应用于自监督预训练中,将3D先验嵌入到2D的学习特征中,并对多个场景理解任务产生了改进,尤其是语义分割。
Abstract
Current popular backbones in computer vision, such as Vision Transformers (ViT) and ResNets are trained to perceive the world from 2D images. However, to more effectively understand 3D structural priors in
2d backbones
, we propose
→