In this paper, we study the problem of 3D scene geometry decomposition and manipulation from 2D views. By leveraging the recent implicit neural representation techniques, particularly the appealing neural radiance fields, we introduce an object field component to learn unique codes for all individual objects in 3D space only from 2D supervision. The key to this component is a series of carefully designed loss functions to enable every 3D point, especially in non-occupied space, to be effectively optimized even without 3D labels. In addition, we introduce an inverse query algorithm to freely manipulate any specified 3D object shape in the learned scene representation. Notably, our manipulation algorithm can explicitly tackle key issues such as object collisions and visual occlusions. Our method, called DM-NeRF, is among the first to simultaneously reconstruct, decompose, manipulate and render complex 3D scenes in a single pipeline. Extensive experiments on three datasets clearly show that our method can accurately decompose all 3D objects from 2D views, allowing any interested object to be freely manipulated in 3D space such as translation, rotation, size adjustment, and deformation.

本研究利用迄今为止最新的神经辐射场技术，通过引入物体场组件从 2D 视野中学习 3D 空间中所有个体物体的独特代码，并引入反向查询算法以自由地操作学习场景表示中特定的 3D 物体形状，进而解决物体碰撞和视觉遮挡等关键问题，能够准确地从 2D 视野中分解和操作 3D 场景的研究方法被称为 DM-NeRF。

DM-NeRF：从2D图像中分解和操作3D场景几何形状