We propose UniSeg3D, a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model. Most previous 3D segmentation approaches are specialized for a specific task, thereby limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six tasks into unified representations processed by the same Transformer. It facilitates inter-task knowledge sharing and, therefore, promotes comprehensive 3D scene understanding. To take advantage of multi-task unification, we enhance the performance by leveraging task connections. Specifically, we design a knowledge distillation method and a contrastive learning method to transfer task-specific knowledge across different tasks. Benefiting from extensive inter-task knowledge sharing, our UniSeg3D becomes more powerful. Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods, even those specialized for individual tasks. We hope UniSeg3D can serve as a solid unified baseline and inspire future work. The code will be available at https://dk-liang.github.io/UniSeg3D/.

提出了UniSeg3D，这是一个统一的三维分割框架，可以在一个模型内完成全景、语义、实例、交互、指向性和开放词汇的语义分割任务。该方法将六个任务统一为由相同Transformer处理的统一表示，促进了任务间的知识共享，从而提升了对三维场景的综合理解。通过利用任务连接，通过设计知识蒸馏和对比学习方法，在多任务统一化的基础上提高了性能。在三个基准测试中的实验证明了UniSeg3D的优越性，即使是那些专门针对特定任务的方法也无法与之相比。希望UniSeg3D能够作为一个坚实的统一基准，并激发未来的研究。

一个统一的三维场景理解框架