Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data. However, when adapting pre-trained VLMs to a sequence of downstream tasks, they are prone to forgetting previously learned knowledge and degrade their zero-shot classification capability. To tackle this problem, we propose a unique Selective Dual-Teacher Knowledge Transfer framework that leverages the most recent fine-tuned and the original pre-trained VLMs as dual teachers to preserve the previously learned knowledge and zero-shot capabilities, respectively. With only access to an unlabeled reference dataset, our proposed framework performs a selective knowledge distillation mechanism by measuring the feature discrepancy from the dual teacher VLMs. Consequently, our selective dual-teacher knowledge distillation would mitigate catastrophic forgetting of previously learned knowledge while preserving the zero-shot capabilities from pre-trained VLMs. Through extensive experiments on benchmark datasets, we show that our proposed framework is favorable against state-of-the-art continual learning approaches for preventing catastrophic forgetting and zero-shot degradation.

大规模视觉语言模型（VLMs）展示了对未见域数据具有强大的零样本泛化能力。然而，适应预训练的VLMs并执行一系列下游任务时，它们容易忘记以前学到的知识并降低其零样本分类能力。为解决这个问题，我们提出了一种独特的选择性双教师知识迁移框架，利用最新的精细调整和原始的预训练VLMs作为双教师来保留以前学到的知识和零样本能力。通过仅访问未标记的参考数据集，我们提出的框架通过测量双教师VLMs的特征差异来执行选择性知识蒸馏机制。因此，我们的选择性双教师知识蒸馏将缓解以前学到的知识的灾难性遗忘，同时保留预训练VLMs的零样本能力。通过对基准数据集进行大量实验，我们证明了我们的提出的框架对于防止灾难性遗忘和零样本退化的最新持续学习方法是有利的。

选择与概括：选择性双教师知识迁移用于视觉语言模型的持续学习