为何我们需要新的自然语言生成评价指标

Jul, 2017

为何我们需要新的自然语言生成评价指标

Why We Need New Evaluation Metrics for NLG

Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser

TL;DR本文探究了NLG评估中常用的自动化评估方法的局限性，并提出了一种系统和数据独立的新型评价方法，包括先进的基于词汇和基于语法的度量。实验证明，这些方法并不能完全反映人的判断，且表现受到数据与系统的影响。但是，自动评估仍可支持系统的开发，发现系统表现不佳的问题。

Abstract

The majority of nlg evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of