Recognizing arbitrary objects in the wild has been a challenging problem due
to the limitations of existing classification models and datasets. In this
paper, we propose a new task that aims at parsing scenes with a large and open
vocabulary, and several evaluation metrics are explored for this problem. Our
proposed approach to this problem is a joint image
OV-SAM3D 是一个通用框架,用于不需要训练即可理解任何 3D 场景的开放词汇三维场景理解,通过使用 Segment Anything Model (SAM) 生成超点并通过 Recognize Anything Model (RAM) 的开放标签和操作表,结合超点和分割掩模生成最终的 3D 实例,经过对 ScanNet200 和 nuScenes 数据集的实证评估,我们的方法在未知的开放世界环境中超越了现有的开放词汇方法。