TL;DR本文探讨了借助 Word Embedding 计算摘要中的语义相似度来代替传统的基于词汇重叠度的 ROUGE 自动评估方法存在的偏差,实验结果显示该方法比传统方法更能准确地与人工评估结果相符。
Abstract
rouge is a widely adopted, automatic evaluation measure for text
summarization. While it has been shown to correlate well with human judgements,
it is biased towards surface lexical similarities. This makes it un