BriefGPT.xyz
Jun, 2024
通过减小稳定性差距来实现高效的持续性预训练
Efficient Continual Pre-training by Mitigating the Stability Gap
HTML
PDF
Yiduo Guo, Jie Fu, Huishuai Zhang, Dongyan Zhao, Yikang Shen
TL;DR
持续预训练是适应大型语言模型(LLM)到新领域的主要方法之一,研究探讨了在这个过程中LLM的行为和性能,提出了三种有效策略来增强LLM在固定计算资源下的性能,经实验证实这些策略在医学任务性能和通用任务性能方面均取得了令人满意的成果。
Abstract
continual pre-training
has increasingly become the predominant approach for adapting
large language models
(LLMs) to new domains. This process involves updating the pre-trained LLM with a corpus from a new domain
→