BriefGPT.xyz
Jul, 2022
我需要多少更多的数据?——预估下游任务的需求
How Much More Data Do I Need? Estimating Requirements for Downstream Tasks
HTML
PDF
Rafid Mahmood, James Lucas, David Acuna, Daiqing Li, Jonah Philion...
TL;DR
本文研究机器学习系统的数据需求估计问题,通过探究一系列广义幂律函数来更准确地估计数据集规模与目标性能之间的关系,并通过引入校正因子和多轮数据采集策略来优化数据需求估计模型,从而实现开发时间和数据获取成本的节省。
Abstract
Given a small
training data
set and a
learning algorithm
, how much more data is necessary to reach a target validation or test performance? This question is of critical importance in applications such as autonomo
→