Vision-Language Models (VLMs) have witnessed a surge in both research and real-world applications. However, as they are becoming increasingly prevalent, ensuring their robustness against adversarial attacks is paramount. This work systematically investigates the impact of model design choices on the adversarial robustness of VLMs against image-based attacks. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt formatting. By rephrasing questions and suggesting potential adversarial perturbations, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments.

机视语言模型在研究和实际应用中取得了突破，但其对抗性攻击的鲁棒性至关重要。本研究系统地研究了模型设计选择对机视语言模型在图像攻击方面的抗打击能力的影响。此外，我们引入了新颖且经济的方法通过提示格式来增强鲁棒性。通过改写问题和建议可能的对抗性扰动，我们在抵御强大的图像攻击（如Auto-PGD）方面实现了显著的改进。我们的发现为开发更具鲁棒性的机视语言模型提供了重要指导，尤其是在安全关键环境中的部署。

走向对抗性强大的视觉语言模型：设计选择和提示格式技术的洞察