面向聊天机器人人工评估的标准准则：一项调研

May, 2021

Towards Standard Criteria for human evaluation of Chatbots: A Survey

Hongru Liang, Huaqing Li

TL;DR对涉及Chatbots的105篇人机评估论文进行了全面调查，提出了五个标准评估指标及其精确定义，以解决由多样化评价标准导致的可靠性和可复制性问题。

Abstract

human evaluation is becoming a necessity to test the performance of chatbots. However, off-the-shelf settings suffer the severe reliability