The development of vision-language models (VLMs) for histo-pathology has shown promising new usages and zero-shot performances. However, current approaches, which decompose large slides into smaller patches, focus solely on inductive classification, i.e., prediction for each patch is made independently of the other patches in the target test data. We extend the capability of these large models by introducing a transductive approach. By using text-based predictions and affinity relationships among patches, our approach leverages the strong zero-shot capabilities of these new VLMs without any additional labels. Our experiments cover four histopathology datasets and five different VLMs. Operating solely in the embedding space (i.e., in a black-box setting), our approach is highly efficient, processing $10^5$ patches in just a few seconds, and shows significant accuracy improvements over inductive zero-shot classification. Code available at https://github.com/FereshteShakeri/Histo-TransCLIP.

本研究解决了当前组织病理学领域视觉-语言模型在逐块独立分类中的不足，提出了一种新的传导方法，通过结合文本预测和块间的亲和关系来提升模型性能。实验表明，该方法在四个数据集上大幅提高了分类准确度，且高效地处理了大量数据，展示了其在无标签条件下的强大潜力。

提升视觉-语言模型在组织病理学分类中的表现：一次性预测