Although remarkable advancements have been made recently in point cloud analysis through the exploration of transformer architecture, it remains challenging to effectively learn local and global structures within point clouds. In this paper, we propose a new transformer architecture equipped with a collect-and-distribute mechanism to communicate short- and long-range contexts of point clouds, which we refer to as CDFormer. Specifically, we first utilize self-attention to capture short-range interactions within each local patch, and the updated local features are then collected into a set of proxy reference points from which we can extract long-range contexts. Afterward, we distribute the learned long-range contexts back to local points via cross-attention. To address the position clues for short- and long-range contexts, we also introduce context-aware position encoding to facilitate position-aware communications between points. We perform experiments on four popular point cloud datasets, namely ModelNet40, ScanObjectNN, S3DIS, and ShapeNetPart, for classification and segmentation. Results show the effectiveness of the proposed CDFormer, delivering several new state-of-the-art performances on point cloud classification and segmentation tasks. The code is available at \url{https://github.com/haibo-qiu/CDFormer}.

本研究提出了CDFormer，一种新的利用收集和分布机制的Transformer架构，可对点云的局部和全局结构进行有效学习，并在四个流行的点云数据集上取得了新的最佳分类和分割结果。

用于3D点云分析的收集和分配变换器