BriefGPT.xyz
Jun, 2023
Multi-CLIP:针对 3D 场景中问答任务的对比视觉语言预训练
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes
HTML
PDF
Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis...
TL;DR
本研究提出一种名为MULTI-CLIP的3D预训练视觉语言模型,可有效提高现有3D视觉问答任务的表现并构建出具有良好结构的3D场景特征空间。
Abstract
Training models to apply
common-sense linguistic knowledge
and visual concepts from 2D images to
3d scene understanding
is a promising direction that researchers have only recently started to explore. However, it
→