Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout lifelong learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters in response to new tasks. To preserve the zero-shot recognition capability of vision-language models, we further introduce a Distribution Discriminative Auto-Selector (DDAS) that automatically routes in-distribution and out-of-distribution inputs to the MoE Adapter and the original CLIP, respectively. Through extensive experiments across various settings, our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%. Our code locates at https://github.com/JiazuoYu/MoE-Adapters4CL

提出了一种参数高效的持续学习框架，通过在视觉语言模型中动态扩展一个预训练的CLIP模型，采用专家混合（Mixture-of-Experts）适配器以应对新任务，并引入分布鉴别自动选择器（DDAS）以保留视觉语言模型的零样本识别能力，并通过各种实验验证，该方法在提升性能的同时减少了60%的参数训练负担。

通过专家混合适配器增强视觉语言模型的持续学习