TL;DR通过利用 VFMs 的像素级语义增强三维表示学习,采用 von Mises-Fisher 分布对特征空间进行结构化,以解决对手法的挑战并在下游任务中始终优于现有的图像到 LiDAR 对比蒸馏方法。
Abstract
contrastive image-to-lidar knowledge transfer, commonly used for learning 3D
representations with synchronized images and point clouds, often faces a
self-conflict dilemma. This issue arises as contrastive losses unintentionally
dissociate features of unmatched points and pixels that s