BriefGPT.xyz
Jun, 2024
HalluDial: 自动对话层次的大规模幻觉评估基准
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
HTML
PDF
Wen Luo, Tianshu Shen, Wei Li, Guangyue Peng, Richeng Xuan...
TL;DR
提出HalluDial,这是首个用于自动对话级幻觉评估的综合大规模基准测试。HalluDial包含了以上文提到的内容,并包括了分为自发和感应性的幻觉情景,并涵盖了实际性幻觉和忠实性幻觉。
Abstract
large language models
(LLMs) have significantly advanced the field of Natural Language Processing (NLP), achieving remarkable performance across diverse tasks and enabling widespread real-world applications. However, LLMs are prone to
→