BriefGPT.xyz
May, 2025
叠加导致神经网络的稳健缩放
Superposition Yields Robust Neural Scaling
HTML
PDF
Yizhou liu, Ziming Liu, Jeff Gore
TL;DR
本研究解决了当前大型语言模型(LLMs)中神经缩放规律的起源不明的问题,提出了一个基于叠加和特征频率的玩具模型。研究发现,当叠加效应强烈时,损失与模型维度成反比关系,并且在分析开源LLMs时,这种预测得到了验证,表明叠加表示是神经缩放规律的重要机制,有望启发新的训练策略和模型架构。
Abstract
The success of today's large
Language Models
(LLMs) depends on the observation that larger models perform better. However, the origin of this
Neural Scaling
law -- the finding that loss decreases as a power law w
→