Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. However, existing depth scaling-up methods rely on empirical heuristic rules for layer duplication, which result in poorer initialization and slower convergence during continual pre-training. We propose \textbf{LESA}, a novel learnable method for depth scaling-up. By concatenating parameters from each layer and applying Singular Value Decomposition, we uncover latent patterns between layers, suggesting that inter-layer parameters can be learned. LESA uses a neural network to predict the parameters inserted between adjacent layers, enabling better initialization and faster training. Experiments show that LESA outperforms existing baselines, achieving superior performance with less than half the computational cost during continual pre-training. Extensive analyses demonstrate its effectiveness across different model sizes and tasks.

本研究解决了大规模语言模型（LLM）从头开始训练所需的巨高计算资源问题，提出了一种新的可学习层级扩展方法LESA。通过结合层参数并应用奇异值分解，LESA能够发现层间参数的潜在模式，从而改善模型初始化并加快训练速度。实验表明，LESA在持续预训练期间性能超越现有基线，并显著降低计算成本。

LESA: 可学习的LLM层级扩展