Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, they are prone to contextual hallucination, generating information that is either unsubstantiated or contradictory to the given context. Although many studies have investigated contextual hallucinations in LLMs, addressing them in long-context inputs remains an open problem. In this work, we take an initial step toward solving this problem by constructing a dataset specifically designed for long-context hallucination detection. Furthermore, we propose a novel architecture that enables pre-trained encoder models, such as BERT, to process long contexts and effectively detect contextual hallucinations through a decomposition and aggregation mechanism. Our experimental results show that the proposed architecture significantly outperforms previous models of similar size as well as LLM-based models across various metrics, while providing substantially faster inference.

本研究解决了大语言模型（LLMs）在长上下文输入中生成未经证实或与上下文矛盾的信息的问题。通过构建专门用于长上下文幻觉检测的数据集，并提出了一种新架构，使预训练的编码器模型能够有效地处理长上下文并检测幻觉，实验结果显示，该架构在各项指标上显著优于之前的模型，同时提供了更快的推理速度。

面向长上下文幻觉检测的研究