Large Language Models (LLMs) are increasingly used in various contexts, yet remain prone to generating non-factual content, commonly referred to as "hallucinations". The literature categorizes hallucinations into several types, including entity-level, relation-level, and sentence-level hallucinations. However, existing hallucination datasets often fail to capture fine-grained hallucinations in multilingual settings. In this work, we introduce HalluVerse25, a multilingual LLM hallucination dataset that categorizes fine-grained hallucinations in English, Arabic, and Turkish. Our dataset construction pipeline uses an LLM to inject hallucinations into factual biographical sentences, followed by a rigorous human annotation process to ensure data quality. We evaluate several LLMs on HalluVerse25, providing valuable insights into how proprietary models perform in detecting LLM-generated hallucinations across different contexts.

本研究解决了当前多语言环境下对于大语言模型（LLM）生成的非事实内容（即幻觉）缺乏细粒度标注数据集的问题。我们提出了HalluVerse25数据集，使用LLM生成幻觉并通过人类注释确保数据质量，显著提升了对多语言幻觉的检测能力。研究结果为不同上下文中大语言模型的幻觉检测提供了重要见解。

HalluVerse25：针对大语言模型幻觉的细粒度多语言基准数据集