复现一组可控文本生成技术的度量评估

May, 2024

复现一组可控文本生成技术的度量评估

Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques

Michela Lorandi, Anya Belz

TL;DR重新运行基于指标的评估比人为评估更直接且结果更接近，但在重新运行时不一定会产生与原始结果相同的结果，并且可能揭示原始工作的错误。

Abstract

Rerunning a metric-based evaluation should be more straightforward, and results should be closer, than in a human-based evaluation, especially where code and model checkpoints are made available by the original authors. As this report of our efforts to →