BriefGPT.xyz
Oct, 2022
DEMETR:用于翻译评估度量的诊断工具
DEMETR: Diagnosing Evaluation Metrics for Translation
HTML
PDF
Marzena Karpinska, Nishant Raj, Katherine Thai, Yixiao Song, Ankita Gupta...
TL;DR
本研究旨在探讨机器翻译评估指标的行为特征,通过DEMETR诊断性数据集、跨越语义、语法和形态学错误类别的35种不同语言扰动,发现学习评价指标表现比基于字符串的指标表现更好,而且它们对不同现象的敏感度不同,该研究公开了DEMETR以推动机器翻译评估指标的更多发展。
Abstract
While
machine translation
evaluation metrics
based on string overlap (e.g., BLEU) have their limitations, their computations are transparent: the BLEU score assigned to a particular candidate translation can be t
→