Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of tasks in different domains. However, they sometimes generate responses that are logically coherent but factually incorrect or misleading, which is known as LLM hallucinations. Data-driven supervised methods train hallucination detectors by leveraging the internal states of LLMs, but detectors trained on specific domains often struggle to generalize well to other domains. In this paper, we aim to enhance the cross-domain performance of supervised detectors with only in-domain data. We propose a novel framework, prompt-guided internal states for hallucination detection of LLMs, namely PRISM. By utilizing appropriate prompts to guide changes in the structure related to text truthfulness within the LLM's internal states, we make this structure more salient and consistent across texts from different domains. We integrated our framework with existing hallucination detection methods and conducted experiments on datasets from different domains. The experimental results indicate that our framework significantly enhances the cross-domain generalization of existing hallucination detection methods.

本文针对大型语言模型（LLMs）生成的逻辑连贯但事实不正确的回复（即幻觉现象）进行研究，旨在提高现有监督检测器在不同领域的跨领域表现。我们提出了一种新颖的框架PRISM，通过适当的提示引导LLMs内部状态中与文本真实性相关的结构变化，从而提高其在不同领域文本中的显著性和一致性。实验结果表明，该框架显著增强了现有幻觉检测方法的跨领域泛化能力。

基于提示引导的内部状态用于大型语言模型的幻觉检测