Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in performing complex multimodal tasks. However, they are still plagued by object hallucination: the misidentification or misclassification of objects present in images. To this end, we propose HALLUCINOGEN, a novel visual question answering (VQA) object hallucination attack benchmark that utilizes diverse contextual reasoning prompts to evaluate object hallucination in state-of-the-art LVLMs. We design a series of contextual reasoning hallucination prompts to evaluate LVLMs' ability to accurately identify objects in a target image while asking them to perform diverse visual-language tasks such as identifying, locating or performing visual reasoning around specific objects. Further, we extend our benchmark to high-stakes medical applications and introduce MED-HALLUCINOGEN, hallucination attacks tailored to the biomedical domain, and evaluate the hallucination performance of LVLMs on medical images, a critical area where precision is crucial. Finally, we conduct extensive evaluations of eight LVLMs and two hallucination mitigation strategies across multiple datasets to show that current generic and medical LVLMs remain susceptible to hallucination attacks.

本研究针对大型视觉语言模型（LVLMs）中物体幻觉问题，提出了HALLUCINOGEN基准，旨在评估其在多模态任务中对图像物体的准确识别能力。通过设计多样的上下文推理幻觉提示，本研究不仅丰富了LVLM的评估方式，还扩展至高风险医疗应用中的MED-HALLUCINOGEN，从而揭示了当前模型在医学图像方面的幻觉脆弱性，对医疗精确性具有重要影响。

幻觉生成：评估大规模视觉语言模型中的物体幻觉的基准