PLM perplexity 不可靠于文本质量评估

Oct, 2022

Perplexity from PLM Is Unreliable for Evaluating Text Quality

Yequan Wang, Jiawen Deng, Aixin Sun, Xuying Meng

TL;DR本文阐述了困惑度(PPL)在生成文本质量评估中存在的问题，如过度强调其在短文本中的不利作用，以及重复文本区间和标点符号对其表现的影响，实验发现困惑度不可靠。最后，讨论了使用语言模型评估文本质量的关键问题。

Abstract

Recently, amounts of works utilize perplexity~(ppl) to evaluate the quality of the generated text. They suppose that if the value of ppl i