Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate. This issue undermines the effectiveness of LLMs in practical applications, necessitating research into detecting and mitigating hallucinations of LLMs. Previous studies have mainly concentrated on post-processing techniques for hallucination detection, which tend to be computationally intensive and limited in effectiveness due to their separation from the LLM's inference process. To overcome these limitations, we introduce MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. Additionally, we present HELM, a new benchmark for evaluating hallucination detection across multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs during their inference process. Our experiments demonstrate that MIND outperforms existing state-of-the-art methods in hallucination detection.

LLMs出现的幻觉指的是LLMs产生的回应在逻辑上是连贯的，但事实上是不准确的。本文引入了一种名为MIND的无监督训练框架，利用LLMs的内部状态实时检测幻觉，无需手动注释，并提出了用于评估多个LLMs幻觉检测的新基准HELM。我们的实验证明，MIND在幻觉检测方面优于现有的最先进方法。

基于大型语言模型内部状态的非监督实时幻觉检测