Oct, 2023

从视觉语言模型中提炼,以改善视觉任务中的 OOD 泛化能力

TL;DRVision-Language to Vision-Align, Distill, Predict (VL2V-ADiP) is a proposed approach that aligns vision and language modalities to distill pre-trained features and superior generalization for state-of-the-art results in Domain Generalization using Vision-Language Models like CLIP.