BriefGPT.xyz
May, 2023
对长篇问答评估的关键评估
A Critical Evaluation of Evaluations for Long-form Question Answering
HTML
PDF
Fangyuan Xu, Yixiao Song, Mohit Iyyer, Eunsol Choi
TL;DR
对长篇答案进行有针对性的评估研究,强调评估多维度因素,发现自动文本生成的评价指标不能预测人类喜好,建议未来的评估中,应该注重准确性、完整性和客观性等多个方面。
Abstract
long-form question answering
(LFQA) enables answering a wide range of questions, but its flexibility poses enormous challenges for
evaluation
. We perform the first targeted study of the
→