3D semantic occupancy prediction is crucial for finely representing the surrounding environment, which is essential for ensuring the safety in autonomous driving. Existing fusion-based occupancy methods typically involve performing a 2D-to-3D view transformation on image features, followed by computationally intensive 3D operations to fuse these with LiDAR features, leading to high computational costs and reduced accuracy. Moreover, current research on occupancy prediction predominantly focuses on designing specific network architectures, often tailored to particular models, with limited attention given to the more fundamental aspect of semantic feature learning. This gap hinders the development of more transferable methods that could enhance the performance of various occupancy models. To address these challenges, we propose OccLoff, a framework that Learns to Optimize Feature Fusion for 3D occupancy prediction. Specifically, we introduce a sparse fusion encoder with entropy masks that directly fuses 3D and 2D features, improving model accuracy while reducing computational overhead. Additionally, we propose a transferable proxy-based loss function and an adaptive hard sample weighting algorithm, which enhance the performance of several state-of-the-art methods. Extensive evaluations on the nuScenes and SemanticKITTI benchmarks demonstrate the superiority of our framework, and ablation studies confirm the effectiveness of each proposed module.

本研究解决了3D语义占用预测中的特征融合效率低下和计算成本高的问题。我们提出的OccLoff框架通过采用稀疏融合编码器和适应性重样本加权算法，显著提升了模型准确性并减少了计算开销。实验证明，该框架在多项基准测试中表现优越，展示了其对不同占用模型的增强潜力。

OccLoff：学习优化特征融合以进行3D占用预测