BriefGPT.xyz
Feb, 2025
LESA: 可学习的LLM层级扩展
LESA: Learnable LLM Layer Scaling-Up
HTML
PDF
Yifei Yang, Zouying Cao, Xinbei Ma, Yao Yao, Libo Qin...
TL;DR
本研究解决了大规模语言模型(LLM)从头开始训练所需的巨高计算资源问题,提出了一种新的可学习层级扩展方法LESA。通过结合层参数并应用奇异值分解,LESA能够发现层间参数的潜在模式,从而改善模型初始化并加快训练速度。实验表明,LESA在持续预训练期间性能超越现有基线,并显著降低计算成本。
Abstract
Training
Large Language Models
(LLMs) from scratch requires immense computational resources, making it prohibitively expensive.
Model Scaling
-up offers a promising solution by leveraging the parameters of smaller
→