BriefGPT.xyz
Sep, 2020
用户观点决定效用:自然语言处理排行榜评析
Utility is in the Eye of the User: A Critique of NLP Leaderboards
HTML
PDF
Kawin Ethayarajh, Dan Jurafsky
TL;DR
本论文通过微观经济理论的视角,研究了Leaderboard与实际NLP应用之间的分歧,指出Leaderboard并不能很好地代表整个NLP社区,更透明的Leaderboard应该公开与实际应用相关的统计数据,如模型大小、能效和推理延迟,来更好地估计模型对从业者的实用效用。
Abstract
Benchmarks such as GLUE have helped drive advances in
nlp
by incentivizing the creation of more accurate
models
. While this
leaderboard
pa
→