硬币的两面：利用LLMs作为LLMs的评估器进行幻觉生成和检测

Jul, 2024

硬币的两面：利用LLMs作为LLMs的评估器进行幻觉生成和检测

The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs

Anh Thu Maria Bui, Saskia Felizitas Brech, Natalie Hußfeldt, Tobias Jennert, Melanie Ullrich...

TL;DR本文探讨了四个大型语言模型（LLMs）（Llama 3、Gemma、GPT-3.5 Turbo和GPT-4）在幻觉生成和检测任务中的能力，并采用集成多数投票的方法将所有四个模型应用于检测任务，结果对于了解这些模型在处理幻觉生成和检测任务中的优势和不足具有有价值的见解。

Abstract

hallucination detection in Large Language Models (LLMs) is crucial for ensuring their reliability. This work presents our participation in the clef eloquent hallucigen shared task, where the goal is to develop