BriefGPT.xyz
Dec, 2023
利用连续时间控制和摄动理论解锁最佳批处理大小日程
Unlocking optimal batch size schedules using continuous-time control and perturbation theory
HTML
PDF
Stefan Perko
TL;DR
我们推导了随机梯度下降和类似算法的最优批次大小计划,通过近似离散参数更新过程为一族随机微分方程,进而使用学习率展开进行优化处理。我们应用这些结果于线性回归任务中。
Abstract
stochastic gradient descent
(SGD) and its variants are almost universally used to train neural networks and to fit a variety of other
parametric models
. An important hyperparameter in this context is the
→