BriefGPT.xyz
Sep, 2024
智能扩展:利用小模型初始化加速大型语言模型预训练
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
HTML
PDF
Mohammad Samragh, Iman Mirzadeh, Keivan Alizadeh Vahid, Fartash Faghri, Minsik Cho...
TL;DR
本研究解决了大型语言模型预训练阶段效率低下的问题,提出了一种名为HyperCloning的新方法,用小模型对大型模型进行初始化。这种方法使大型模型在训练前继承小模型的预测能力,从而显著减少预训练所需的GPU时间。
Abstract
The
Pre-training
phase of
Language Models
often begins with randomly initialized parameters. With the current trends in scaling models, training their large number of parameters can be extremely slow and costly.
→