BriefGPT.xyz
Apr, 2022
Curriculum: 自然语言理解广覆盖语言现象基准测试
Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding
HTML
PDF
Zeming Chen, Qiyue Gao
TL;DR
本文介绍一种新的NLI基准Curriculum,其中包括36种广泛涵盖的语言现象的数据集和评估程序,证明这种以语言现象驱动的基准在诊断模型行为和验证模型学习质量方面具有有效性,同时为未来对数据集的重新设计、模型架构和学习目标的研究提供了启示和借鉴。
Abstract
In the age of large transformer
language models
,
linguistic evaluation
play an important role in diagnosing models' abilities and limitations on natural language understanding. However, current evaluation methods
→