自然语言生成中的自动评测指标：当前评估实践的调查

Aug, 2024

自然语言生成中的自动评测指标：当前评估实践的调查

Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

Patrícia Schmidtová, Saad Mahamood, Simone Balloccu, Ondřej Dušek, Albert Gatt...

TL;DR本研究聚焦于自然语言生成（NLG）任务中自动评测指标的使用现状，揭示了现有做法的不足，包括不当的指标选择、缺乏实施细节以及与人类评判的相关性缺失。同时，提出了改进建议，以提高该领域的评估规范性。

Abstract

Automatic Metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a →