BriefGPT.xyz
Mar, 2025
技能的最佳扩展计算:知识与推理
Compute Optimal Scaling of Skills: Knowledge vs Reasoning
HTML
PDF
Nicholas Roberts, Niladri Chatterji, Sharan Narang, Mike Lewis, Dieuwke Hupkes
TL;DR
本文研究了计算最佳扩展行为是否依赖于技能,特别关注知识和推理相关的技能,例如知识问答和代码生成。研究表明,不同技能的扩展规律存在根本差异,并指出验证集的配置不当可能会对计算最佳参数数量产生近50%的影响,这为LLM优化提供了重要见解。
Abstract
Scaling Laws
are a critical component of the LLM development pipeline, most famously as a way to forecast training decisions such as 'compute-optimally' trading-off
Parameter Count
and dataset size, alongside a m
→