BriefGPT.xyz
Apr, 2025
C-FAITH:自动化幻觉评估的中文细粒度基准
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
HTML
PDF
Xu Zhang, Zhifei Liu, Jiahao Wang, Huixuan Zhang, Fan Xu...
TL;DR
本研究解决了大型语言模型在生成幻觉时的评价不足,尤其是在中文领域。通过引入HaluAgent框架,自动构建了细粒度问答数据集C-FAITH,使得幻觉评估更加高效和经济。实验结果表明,该基准能够有效评估主流大型语言模型的性能,推动了相关领域的研究进展。
Abstract
Despite the rapid advancement of large
Language Models
, they remain highly susceptible to generating hallucinations, which significantly hinders their widespread application.
Hallucination
research requires dynam
→