BriefGPT.xyz
May, 2022
是否所有基准数据集都是必要的?文本分类数据集评估的试验研究
Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification
HTML
PDF
Yang Xiao, Jinlan Fu, See-Kiong Ng, Pengfei Liu
TL;DR
探讨了基准测试中是否所有数据集都是必要的问题,实验证明,一些不常用的数据集有较强的区分能力,同时针对文本分类任务,通过数据集特征构建了预测模型。
Abstract
In this paper, we ask the research question of whether all the
datasets
in the
benchmark
are necessary. We approach this by first characterizing the distinguishability of
→