构建强韧的图像-语言模型提示

Apr, 2023

Towards Robust Prompts on Vision-Language Models

Jindong Gu, Ahmad Beirami, Xuezhi Wang, Alex Beutel, Philip Torr...

TL;DR本研究通过将多尺度图像特征集成到提示中，提出了一种对于分布偏移具有鲁棒性的提示学习方法，实验结果表明，这种方法在多个基准测试数据集上的鲁棒性和性能有所提高。

Abstract

With the advent of vision-language models (VLMs) that can perform in-context and prompt-based learning, how can we design prompting approaches that robustly generalize to →