BriefGPT.xyz
Oct, 2024
计算约束下的数据选择
Compute-Constrained Data Selection
HTML
PDF
Junjie Oscar Yin, Alexander M. Rush
TL;DR
本研究解决了在计算受限条件下如何有效选择训练数据的问题,提出了一种考虑选择成本的效用函数模型。通过多任务实验,发现许多主流数据选择方法并非计算最优,而成本更低的数据选择方法在理论与实证上均表现出更好的效果。
Abstract
Data Selection
can reduce the amount of training data needed to finetune LLMs; however, the efficacy of
Data Selection
scales directly with its compute. Motivated by the practical challenge of compute-constrained
→