The remarkable advancements in Multimodal Large Language Models (MLLMs) have
not rendered them immune to challenges, particularly in the context of handling
deceptive information in prompts, thus producing hallucinated responses under
such conditions. To quantitatively assess this vulnerability, we present
MAD-Bench, a carefully curated benchmark that contains 850 test samples divided
into 6 categories, such as non-existent objects, count of objects, spatial
relationship, and visual confusion. We provide a comprehensive analysis of
popular MLLMs, ranging from GPT-4V, Gemini-Pro, to open-sourced models, such as
LLaVA-1.5 and CogVLM. Empirically, we observe significant performance gaps
between GPT-4V and other models; and previous robust instruction-tuned models,
such as LRV-Instruction and LLaVA-RLHF, are not effective on this new
benchmark. While GPT-4V achieves 75.02% accuracy on MAD-Bench, the accuracy of
any other model in our experiments ranges from 5% to 35%. We further propose a
remedy that adds an additional paragraph to the deceptive prompts to encourage
models to think twice before answering the question. Surprisingly, this simple
method can even double the accuracy; however, the absolute numbers are still
too low to be satisfactory. We hope MAD-Bench can serve as a valuable benchmark
to stimulate further research to enhance models' resilience against deceptive
prompts.

通过 Quantum-Bench，我们比较了多种先进模型在对抗伪信息的能力上的表现，并提出了通过增加伪信息以增强模型韧性的建议。

欺骗性提示对多模态语言模型的迷惑程度实证分析

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on  Deceptive Prompts

Recent breakthroughs in large language models (LLMs) have brought remarkable
success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is
that the information processed by LLMs is consistently honest, neglecting the
pervasive deceptive or misleading information in human society and AI-generated
content. This oversight makes LLMs susceptible to malicious manipulations,
potentially resulting in detrimental outcomes. This study utilizes the
intricate Avalon game as a testbed to explore LLMs' potential in deceptive
environments. Avalon, full of misinformation and requiring sophisticated logic,
manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans'
recursive thinking and perspective-taking in the Avalon game, we introduce a
novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to
identify and counteract deceptive information. ReCon combines formulation and
refinement contemplation processes; formulation contemplation produces initial
thoughts and speech, while refinement contemplation further polishes them.
Additionally, we incorporate first-order and second-order perspective
transitions into these processes respectively. Specifically, the first-order
allows an LLM agent to infer others' mental states, and the second-order
involves understanding how others perceive the agent's mental state. After
integrating ReCon with different LLMs, extensive experiment results from the
Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around
deceptive information without extra fine-tuning and data. Finally, we offer a
possible explanation for the efficacy of ReCon and explore the current
limitations of LLMs in terms of safety, reasoning, speaking style, and format,
potentially furnishing insights for subsequent research.

使用迷惑性信息鉴别和应对的新框架 Recursive Contemplation（ReCon）提高了大型语言模型在识别和操纵具有迷惑性信息方面的能力，测试使用了迷宫游戏 Avalon 进行了广泛实验，证明了 ReCon 的有效性，无需额外的微调和数据。