BriefGPT.xyz
Nov, 2023
用BREAD分辨纯糠垩性:一个用于检测文本冗余的开源基准和度量标准
Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text
HTML
PDF
Isaac Caswell, Lisa Wang, Isabel Papadimitriou
TL;DR
通过创建人工标注的重复的样板代码与合理语言内容之间的基准测试,以及评估CRED(字符冗余)得分的有效性,为改进过滤方法提供资源,推动低资源语言的清洁语言建模语料库的发展。
Abstract
data quality
is a problem that perpetually resurfaces throughout the field of
nlp
, regardless of task, domain, or architecture, and remains especially severe for
→