AbstractThe correlation between NLG automatic
Evaluation Metrics and human evaluation is often regarded as a critical criterion for assessing the capability of an evaluation metric. However, different grouping methods and correlation coefficients result in various types of
→