MENLI: 自然语言推理的鲁棒性评估度量

Aug, 2022

MENLI: 自然语言推理的鲁棒性评估度量

MENLI: Robust Evaluation Metrics from Natural Language Inference

Yanran Chen, Steffen Eger

TL;DR本文提出基于自然语言推断方法的评价指标，相比以往BERT-based评价指标更具鲁棒性，并结合其他评价指标可以同时提高鲁棒性和质量指标。

Abstract

Recently proposed bert-based evaluation metrics perform well on standard evaluation benchmarks but are vulnerable to adversarial attacks, e.g., relating to factuality errors. We argue that this stems (in part) fr