Hallucination is a well-known phenomenon in text generated by large language models (LLMs). The existence of hallucinatory responses is found in almost all application scenarios e.g., summarization, question-answering (QA) etc. For applications requiring high reliability (e.g., customer-facing assistants), the potential existence of hallucination in LLM-generated text is a critical problem. The amount of hallucination can be reduced by leveraging information retrieval to provide relevant background information to the LLM. However, LLMs can still generate hallucinatory content for various reasons (e.g., prioritizing its parametric knowledge over the context, failure to capture the relevant information from the context, etc.). Detecting hallucinations through automated methods is thus paramount. To facilitate research in this direction, we introduce a sophisticated dataset, DelucionQA, that captures hallucinations made by retrieval-augmented LLMs for a domain-specific QA task. Furthermore, we propose a set of hallucination detection methods to serve as baselines for future works from the research community. Analysis and case study are also provided to share valuable insights on hallucination phenomena in the target scenario.

大型语言模型（LLM）生成的文本中存在幻觉现象，通过信息检索来减少幻觉数量，但仍存在各种原因导致幻觉产生。为了促进这个方向的研究，我们引入了一个复杂的数据集DelucionQA，用于捕捉检索增强LLM在特定领域的问答任务中所产生的幻觉，并提出了一系列幻觉检测方法作为未来研究的基准。分析和案例研究还提供了有关目标场景中幻觉现象的宝贵见解。

DelucionQA: 领域特定问答中的幻觉检测