We present a novel approach that converts partial and noisy RGB-D scans into high-quality 3D scene reconstructions by inferring unobserved scene geometry. Our approach is fully self-supervised and can hence be trained solely on real-world, incomplete scans. To achieve self-supervision, we remove frames from a given (incomplete) 3D scan in order to make it even more incomplete; self-supervision is then formulated by correlating the two levels of partialness of the same scan while masking out regions that have never been observed. Through generalization across a large training set, we can then predict 3D scene completion without ever seeing any 3D scan of entirely complete geometry. Combined with a new 3D sparse generative neural network architecture, our method is able to predict highly-detailed surfaces in a coarse-to-fine hierarchical fashion, generating 3D scenes at 2cm resolution, more than twice the resolution of existing state-of-the-art methods as well as outperforming them by a significant margin in reconstruction quality.

提出一种利用自监督的方法，将部分和嘈杂的RGB-D扫描转换为高质量的3D场景重建，从而实现预测未观测到的场景几何形状，并通过新的3D稀疏生成神经网络架构来生成高分辨率的3D场景表面并提高重建质量。

SG-NN: 面向自监督RGB-D扫描场景补全的稀疏生成神经网络