BriefGPT.xyz
Dec, 2021
生成和评估语言的双重排行榜
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
HTML
PDF
Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Lavinia Dunagan, Jacob Morrison...
TL;DR
提出了一种新的比较模型,即Bidimensional Leaderboards,这种模型同时跟踪语言生成模型的进展和评价指标,通过人类评价,对评价指标进行排名和选择,以模型和评价指标为竞争方,最后得出一个集成评价指标。
Abstract
natural language processing
researchers have identified limitations of evaluation methodology for generation tasks, with new questions raised about the validity of automatic
metrics
and of crowdworker judgments.
→