Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt, which are not optimal for specific domains. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which acts as a learning vector that undergoes fine-tuning based on specific domain data. Prior prompt learning methods primarily learn a fixed prompt and residuled prompt from training samples. However, the learned prompts lack diversity and ignore information about unseen domains, potentially compromising the transferability of the prompts. In this paper, we reframe the prompt learning framework from a generative perspective and propose a simple yet efficient method for the Domain Generalization (DG) task, namely \textbf{S}oft \textbf{P}rompt \textbf{G}eneration (SPG). To the best of our knowledge, we are the first to introduce the generative model into prompt learning in VLMs and explore its potential for producing soft prompts by relying solely on the generative model, ensuring the diversity of prompts. Specifically, SPG consists of a two-stage training phase and an inference phase. During the training phase, we introduce soft prompt labels for each domain, aiming to incorporate the generative model domain knowledge. During the inference phase, the generator of the generative model is employed to obtain instance-specific soft prompts for the unseen target domain. Extensive experiments on five domain generalization benchmarks of three DG tasks demonstrate that our proposed SPG achieves state-of-the-art performance. The code will be available soon.

大型预训练视觉语言模型（VLMs）在下游任务中展现出令人印象深刻的零-shot能力，但人工设计的提示对特定领域不够优化。本文提出了一种用于下游任务的软提示方法，通过在特定域数据上进行微调，将软提示作为学习向量。我们从生成的角度重构了提示学习框架，并提出了一种简单而高效的域泛化（DG）任务方法，即软提示生成（SPG）。在训练阶段，我们引入了每个领域的软提示标签，以融合生成模型的领域知识。在推理阶段，生成模型的生成器被用来获取未知目标域的实例特定软提示。对三个域泛化任务的五个领域泛化基准进行的大量实验证明了我们提出的SPG方法达到了最先进的性能。代码将很快提供。

领域泛化的软提示生成