We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category, but is expected to identify novel objects and affordances. While vision-language models excel at recognizing novel objects and scenes, they often struggle to understand finer levels of granularity such as affordances. To handle this issue, we conduct a comprehensive analysis of existing foundation models, to explore their inherent understanding of affordances and assess the potential for data-limited affordance learning. We then propose a vision-language framework with simple and effective designs that boost the alignment between visual features and affordance text embeddings. Experiments on two affordance segmentation benchmarks show that the proposed method outperforms state-of-the-art models with less than 1% of the full training data, and exhibits reasonable generalization capability on unseen objects and affordances.

介绍了一种单次打开机会学习（OOAL）方法，通过仅使用基本目标类别的一个示例进行训练，但可以识别新的对象和作用条件。实验表明，该方法在两个作用条件分割基准测试中胜过了现有模型，仅使用不到全量训练数据的1％，并展现了对未知对象和作用条件的合理泛化能力。

基于基础模型的一次打开可行性学习