Empathetic Conversational Systems (ECS) are built to respond empathetically to the user's emotions and sentiments, regardless of the application domain. Current ECS studies evaluation approaches are restricted to offline evaluation experiments primarily for gold standard comparison & benchmarking, and user evaluation studies for collecting human ratings on specific constructs. These methods are inadequate in measuring the actual quality of empathy in conversations. In this paper, we propose a multidimensional empathy evaluation framework with three new methods for measuring empathy at (i) structural level using three empathy-related dimensions, (ii) behavioral level using empathy behavioral types, and (iii) overall level using an empathy lexicon, thereby fortifying the evaluation process. Experiments were conducted with the state-of-the-art ECS models and large language models (LLMs) to show the framework's usefulness.

本研究解决了现有的同理心对话系统评估方法不足以测量对话中同理品质的问题。提出了一种多维度同理心评估框架，包括结构层面、行为层面和整体层面三种新方法，显著增强了评估过程。实验结果表明，该框架在评估现代同理心对话系统的有效性方面具有潜在影响。

多维度同理心对话系统评估框架的构建