This work introduces a simple regressive ensemble for evaluating machine translation quality based on a set of novel and established metrics. We evaluate the ensemble using a correlation to expert-based MQM scores of the WMT 2021 Metrics workshop. In both monolingual and zero-shot cross-lingual settings, we show a significant performance improvement over single metrics. In the cross-lingual settings, we also demonstrate that an ensemble approach is well-applicable to unseen languages. Furthermore, we identify a strong reference-free baseline that consistently outperforms the commonly-used BLEU and METEOR measures and significantly improves our ensemble's performance.

本研究介绍了一种简单的回归集成方法，用于评估机器翻译质量。我们使用新颖和已建立的多种指标对集成模型进行了评估，并将其与WMT 2021 Metrics工作坊的专家评分进行了相关性比较。在单语和零-shot跨语言设置中，我们表明与单一指标相比，集成模型在性能上得到了显著的提高。在跨语言设置中，我们还证明了集成方法适用于未知语言。此外，我们还确定了一个强有力的基准线，它一直表现优于常用的BLEU和METEOR度量标准，并显着提高了我们集成模型的性能。

回归集成用于机器翻译质量评估