PairEval：使用配对比较进行开放域对话评价

Apr, 2024

PairEval：使用配对比较进行开放域对话评价

PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

ChaeHun Park, Minseok Choi, Dohyun Lee, Jaegul Choo

TL;DR提出了一种基于对话响应之间的比较评估的对话评估度量方法PairEval，该度量方法比基准度量方法更具鲁棒性，并且与人类判断的相关性更高。

Abstract

Building a reliable and automated evaluation metric is a necessary but challenging problem for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess generated responses by consideri