BriefGPT.xyz
Jul, 2018
深度神经网络损失函数的极限方向与随机梯度下降步长的关系
DNN's Sharpest Directions Along the SGD Trajectory
HTML
PDF
Stanisław Jastrzębski, Zachary Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio...
TL;DR
使用较小的学习率和SGD最陡峭的方向进行训练可以提高模型的训练速度和泛化能力,而较大的学习率或较小的批量大小将导致SGD进入更宽的区域。
Abstract
Recent work has identified that using a high
learning rate
or a small batch size for
stochastic gradient descent
(SGD) based training of deep
neu
→