CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects.Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods.The code are made publicly available at: https://github.com/leaves162/CLIPtrase.

通过研究CLIP的[CLS]标记对补丁特征相关性的影响，我们提出了一种称为CLIPtrase的训练免费的语义分割策略，通过重新校准补丁之间的自相关性来提高局部特征的认知能力。该方法在分割准确性和对象间语义一致性的保持方面表现出显著的改进，超过了现有的最先进的无需训练的方法。

探索 CLIP 在无需培训的开放词汇语义分割中的潜力