BriefGPT.xyz
May, 2023
评估开放式问答系统评估
Evaluating Open Question Answering Evaluation
HTML
PDF
Cunxiang Wang, Sirui Cheng, Zhikun Xu, Bowen Ding, Yidong Wang...
TL;DR
本研究针对认知智能领域中的Open Question Answering任务进行评估,提出了QA Evaluation任务和相应的数据集,在考虑到自动评估方法的局限性的基础上,采用人工评估来更准确地衡量基于人工智能的答案的准确性和F1分数,并研究表现高度相关且更可靠的评估方法以及当前方法的缺陷,最终生成的数据集有望促进更有效的自动评估工具的发展。
Abstract
This study focuses on the evaluation of
open question answering
(Open-QA) tasks, which have become vital in the realm of artificial intelligence. Current
automatic evaluation methods
have shown limitations, indic
→