Pre-trained contrastive vision-language models have demonstrated remarkable
performance across a wide range of tasks. However, they often struggle on
fine-trained datasets with categories not adequately represented during
pre-training, which makes adaptation necessary. Recent works hav