BriefGPT.xyz
Aug, 2023
大型语言模型的持续预训练:如何(重新)热启动您的模型?
Continual Pre-Training of Large Language Models: How to (re)warm your model?
HTML
PDF
Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony...
TL;DR
这项研究考察了不同预热策略对大型语言模型的影响,发现重启模型预热可以提高下游性能,即使在大型下游数据集中也优于从头开始训练的模型。
Abstract
large language models
(LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the
con
→