The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options. Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with scaling laws. Unlike pre-training, We find that the fine-tuning scaling curve includes not just the well-known "power phase" but also the previously unobserved "pre-power phase". We also explain why existing scaling laws fail to capture this phase transition phenomenon both theoretically and empirically. To address this, we introduce the concept of "pre-learned data size" into our rectified scaling law, which overcomes theoretical limitations and fits experimental results much better. By leveraging our law, we propose a novel LLM selection algorithm that selects the near-optimal model with hundreds of times less resource consumption, while other methods may provide negatively correlated selection.

在本研究中，我们针对有限资源的情况，通过预测微调性能并阐明其与扩展规律的自然连接，解决了在众多选项中选择最合适微调模型的挑战。我们发现，与预训练不同，微调的扩展曲线不仅包括众所周知的“幂阶段”，还包括以前未观察到的“前幂阶段”。为了克服现有扩展规律无法捕捉这种相变现象的理论和实证限制，我们引入了“预学习数据大小”概念到改进的扩展规律中，这大大提高了实验结果的拟合度。通过利用我们的规律，我们提出了一种新颖的LLM选择算法，用较少的资源消耗选择接近最优的模型，而其他方法可能提供负相关的选择。

通过修正的标度定律选择大型语言模型进行微调