Large vision-language models (LVLMs), designed to interpret and respond to human instructions, occasionally generate hallucinated or harmful content due to inappropriate instructions. This study uses linear probing to shed light on the hidden knowledge at the output layer of LVLMs. We demonstrate that the logit distributions of the first tokens contain sufficient information to determine whether to respond to the instructions, including recognizing unanswerable visual questions, defending against multi-modal jailbreaking attack, and identifying deceptive questions. Such hidden knowledge is gradually lost in logits of subsequent tokens during response generation. Then, we illustrate a simple decoding strategy at the generation of the first token, effectively improving the generated content. In experiments, we find a few interesting insights: First, the CLIP model already contains a strong signal for solving these tasks, indicating potential bias in the existing datasets. Second, we observe performance improvement by utilizing the first logit distributions on three additional tasks, including indicting uncertainty in math solving, mitigating hallucination, and image classification. Last, with the same training data, simply finetuning LVLMs improve models' performance but is still inferior to linear probing on these tasks.

大型视觉-语言模型（LVLMs）在理解和回应人类指令时偶尔生成幻觉或有害内容。本研究利用线性探测方法揭示LVLMs输出层的隐藏知识，证明首个令牌的逻辑回归分布包含足够信息以决定是否回应指令，包括识别无法回答的视觉问题、防御多模态越狱攻击和识别欺骗性问题。这种隐藏知识在响应生成过程中逐渐丧失。然后，我们提出了一个简单的解码策略来改善生成的内容。在实验中，我们发现几个有趣的观察结果：首先，CLIP模型已经具备解决这些任务的强信号，表明现有数据集存在潜在的偏见。其次，利用首个逻辑回归分布在另外三项任务上观察到性能提升，包括指示数学求解的不确定性、减轻幻觉和图像分类。最后，简单微调LVLMs可以改善模型的性能，但仍不及线性探测在这些任务上的表现。

第一位知道的人：令牌分配如何揭示大型视觉语言模型中的隐藏知识？