没有比更好的数据更好的数据：使用质量度量对MT数据进行过滤

Nov, 2023

没有比更好的数据更好的数据：使用质量度量对MT数据进行过滤

There's no Data Like Better Data: Using QE Metrics for MT Data Filtering

Jan-Thorsten Peter, David Vilar, Daniel Deutsch, Mara Finkelstein, Juraj Juraska...

TL;DR使用质量评估（QE）指标过滤训练数据的句子对可以提高翻译质量并减少训练规模一半。

Abstract

quality estimation (QE), the evaluation of machine translation output without the need of explicit references, has seen big improvements in the last years with the use of →