In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It is found that leveraging the textual space of a powerful pre-trained image-language model (such as CLIP) can be beneficial in learning visual features. Therefore, we develop a novel method termed PartSeg for few-shot part segmentation based on multimodal learning. Specifically, we design a part-aware prompt learning method to generate part-specific prompts that enable the CLIP model to better understand the concept of ``part'' and fully utilize its textual space. Furthermore, since the concept of the same part under different object categories is general, we establish relationships between these parts during the prompt learning process. We conduct extensive experiments on the PartImageNet and Pascal$\_$Part datasets, and the experimental results demonstrated that our proposed method achieves state-of-the-art performance.

使用CLIP等强大的预训练图像语言模型在很少标记样本的情况下，开发了一种名为PartSeg的新方法，用于基于多模态学习的少样本部件分割任务，该方法利用部件感知提示学习生成部件特定的提示，从而使CLIP模型更好地理解“部件”的概念并充分利用其文本空间，实验结果在PartImageNet和Pascal_Part数据集上证明了该方法的最新性能。

PartSeg: 通过部位感知提示学习的少样本部分分割