BriefGPT.xyz
Dec, 2024
通过计算高效的模型阶梯建立任务规模法则
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
HTML
PDF
Akshita Bhagia, Jiacheng Liu, Alexander Wettig, David Heineman, Oyvind Tafjord...
TL;DR
本研究解决了预训练语言模型在过度训练环境下的个别任务表现预测问题,提出了一种新颖的两步预测方法。通过训练小规模“阶梯”模型,我们能在资源占用仅为目标模型1%的情况下,成功预测目标模型的任务准确性,展示了该方法在建立规模法则方面的优越性。
Abstract
We develop
Task Scaling Laws
and model ladders to predict the individual task performance of pretrained
Language Models
(LMs) in the overtrained setting. Standard power laws for language modeling loss cannot accu
→