BriefGPT.xyz
May, 2024
如人类评分:用大型语言模型重新思考自动评估
Grade Like a Human: Rethinking Automated Assessment with Large Language Models
HTML
PDF
Wenjing Xie, Juxin Niu, Chun Jason Xue, Nan Guan
TL;DR
我们提出了一个基于大型语言模型的评分系统,包括开发评分标准,提供准确一致的得分和定制化反馈,以及进行后评估,并在新的数据集上进行了广泛实验,验证了我们的方法的有效性。
Abstract
While
large language models
(LLMs) have been used for
automated grading
, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing researc
→