CLIP embeddings have demonstrated remarkable performance across a wide range of computer vision tasks. However, these high-dimensional, dense vector representations are not easily interpretable, restricting their usefulness in downstream applications that require transparency. In this work, we empirically show that CLIP's latent space is highly structured, and consequently that CLIP representations can be decomposed into their underlying semantic components. We leverage this understanding to propose a novel method, Sparse Linear Concept Embeddings (SpLiCE), for transforming CLIP representations into sparse linear combinations of human-interpretable concepts. Distinct from previous work, SpLiCE does not require concept labels and can be applied post hoc. Through extensive experimentation with multiple real-world datasets, we validate that the representations output by SpLiCE can explain and even replace traditional dense CLIP representations, maintaining equivalent downstream performance while significantly improving their interpretability. We also demonstrate several use cases of SpLiCE representations including detecting spurious correlations, model editing, and quantifying semantic shifts in datasets.

通过实验证明，CLIP的潜在空间高度结构化，因此CLIP表示可以分解为其潜在的语义组成部分，并提出Sparse Linear Concept Embeddings（SpLiCE）的新方法，将CLIP表示转化为人类可解释概念的稀疏线性组合。通过实验验证，SpLiCE输出的表示可以解释甚至取代传统的密集CLIP表示，保持等同的下游性能同时显著提高解释性，并展示了SpLiCE表示的几个用例，包括检测虚假相关性、模型编辑和量化数据集中的语义转换。

基于稀疏线性概念嵌入的解读CLIP模型（SpLiCE）