BriefGPT.xyz
Mar, 2024
简单且可扩展的策略,用于持续预训练大型语言模型
Simple and Scalable Strategies to Continually Pre-train Large Language Models
HTML
PDF
Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony...
TL;DR
大型语言模型的持续学习策略可以通过简单且可扩展的方法成功更新模型,同时只需使用一小部分计算资源来达到重新训练的效果。
Abstract
large language models
(LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant
→