BriefGPT.xyz
Aug, 2024
LLMI3D:通过单张2D图像赋能大语言模型的3D感知
LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image
HTML
PDF
Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen...
TL;DR
本研究解决了现有小型3D感知模型在逻辑推理和问答方面的不足。通过提出空间增强的局部特征挖掘、3D查询令牌信息解码和基于几何投影的3D推理等新方法,我们开发了LLMI3D模型,并构建了IG3D数据集以提升3D感知能力。实验表明,LLMI3D在性能上显著超越了现有的方法。
Abstract
Recent advancements in
Autonomous Driving
, augmented reality, robotics, and embodied intelligence have necessitated
3D Perception
algorithms. However, current
→