是否所有基准数据集都是必要的？文本分类数据集评估的试验研究

May, 2022

是否所有基准数据集都是必要的？文本分类数据集评估的试验研究

Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification

Yang Xiao, Jinlan Fu, See-Kiong Ng, Pengfei Liu

TL;DR探讨了基准测试中是否所有数据集都是必要的问题，实验证明，一些不常用的数据集有较强的区分能力，同时针对文本分类任务，通过数据集特征构建了预测模型。

Abstract

In this paper, we ask the research question of whether all the datasets in the benchmark are necessary. We approach this by first characterizing the distinguishability of →