BriefGPT.xyz
Oct, 2023
从惰性训练动态到丰富训练动态的领悟
Grokking as the Transition from Lazy to Rich Training Dynamics
HTML
PDF
Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan
TL;DR
神经网络在从懒散训练动力学过渡到强大的特征学习规则时,产生'领悟现象',通过研究多项式回归问题上的两层神经网络,我们发现特征学习速率和初始特征与目标函数的对齐是产生'领悟现象'的关键因素。
Abstract
We propose that the
grokking phenomenon
, where the train loss of a
neural network
decreases much earlier than its test loss, can arise due to a
n
→