Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily mitigate hallucination by reducing biases contrastively or amplifying the weights of visual embedding during decoding. However, these approaches improve visual perception at the cost of impairing the language reasoning capability. In this work, we propose the Perception Magnifier (PM), a novel visual decoding method that iteratively isolates relevant visual tokens based on attention and magnifies the corresponding regions, spurring the model to concentrate on fine-grained visual details during decoding. Specifically, by magnifying critical regions while preserving the structural and contextual information at each decoding step, PM allows the VLM to enhance its scrutiny of the visual input, hence producing more accurate and faithful responses. Extensive experimental results demonstrate that PM not only achieves superior hallucination mitigation but also enhances language generation while preserving strong reasoning capabilities.Code is available at https://github.com/ShunqiM/PM .

本文探讨了现有视觉语言模型（VLM）中存在的视觉幻觉问题，该问题导致生成的响应与视觉输入不符。我们提出了一种新颖的视觉解码方法——感知放大器（PM），它通过迭代隔离相关视觉标记并放大相应区域，从而增强模型的细致视觉分析能力，提升了语言生成的准确性和合理性。

通过放大镜：用于无幻觉VLM解码的自适应感知放大