BriefGPT.xyz
Nov, 2023
数字苏格拉底:通过解释批评评估LLMs
Digital Socrates: Evaluating LLMs through explanation critiques
HTML
PDF
Yuling Gu, Oyvind Tafjord, Peter Clark
TL;DR
通过定义解释评议任务、建立数据集并使用数学分析,我们提出了Digital Socrates模型,它可以量化和质化地自动评估LLM模型的解释能力,填补了模型解释行为评估工具的重要空白。
Abstract
While
llms
can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood. In response, our goal is to define a detailed way of characterizing the
ex
→