BriefGPT.xyz
May, 2023
大型语言模型能否替代人类评估?
Can Large Language Models Be an Alternative to Human Evaluations?
HTML
PDF
Cheng-Han Chiang, Hung-yi Lee
TL;DR
本文介绍了使用大型语言模型(LLM)代替人类评估来评估人工智能生成的文本的潜力,探索了LLM对两个自然语言处理任务的开放性故事生成和对抗性攻击的评估结果,并发现LLM评估结果与人类专家的评估结果保持一致。
Abstract
human evaluation
is indispensable and inevitable for assessing the quality of texts generated by machine learning models or written by humans. However,
human evaluation
is very difficult to reproduce and its qual
→