BriefGPT.xyz
May, 2024
观察性缩放律与语言模型性能的可预测性
Observational Scaling Laws and the Predictability of Language Model Performance
HTML
PDF
Yangjun Ruan, Chris J. Maddison, Tatsunori Hashimoto
TL;DR
通过观测法利用多个已有模型家族构建单一的扩展律,展示了复杂的扩展现象是可预测的,模型性能可以从简单的非代理基准准确预测,预测了后期训练干预的影响。
Abstract
Understanding how
language model performance
varies with scale is critical to benchmark and algorithm development.
scaling laws
are one approach to building this understanding, but the requirement of training mod
→