BriefGPT.xyz
Mar, 2024
语言模型的可靠性扩展:超级训练与下游任务
Language models scale reliably with over-training and on downstream tasks
HTML
PDF
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman...
TL;DR
基于语言模型的缩放定律,本研究通过建立104个模型的测试平台,以不同数量的标记在三个数据分布上进行训练,研究了超过训练的情况下的缩放和语言模型的下游任务性能之间的关系。
Abstract
scaling laws
are useful guides for developing
language models
, but there are still gaps between current scaling studies and how
language models
→