BriefGPT.xyz
May, 2021
面向聊天机器人人工评估的标准准则:一项调研
Towards Standard Criteria for human evaluation of Chatbots: A Survey
HTML
PDF
Hongru Liang, Huaqing Li
TL;DR
对涉及Chatbots的105篇人机评估论文进行了全面调查,提出了五个标准评估指标及其精确定义,以解决由多样化评价标准导致的可靠性和可复制性问题。
Abstract
human evaluation
is becoming a necessity to test the performance of
chatbots
. However, off-the-shelf settings suffer the severe
reliability
→