探究视觉基础模型的三维认知能力

Apr, 2024

探究视觉基础模型的三维认知能力

Probing the 3D Awareness of Visual Foundation Models

Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li...

TL;DR最近大规模预训练的进步提供了具有强大功能的视觉基础模型。我们分析了视觉基础模型的三维感知能力，并通过一系列实验揭示了当前模型的几个局限性。

Abstract

Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate rep