BriefGPT.xyz
Jan, 2013
随机、稀疏、非光滑梯度的自适应学习率和并行化
Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients
HTML
PDF
Tom Schaul, Yann LeCun
TL;DR
本文针对随机梯度下降法(SGD)调参的问题,提出了一个不需调参的自动降低学习速率的方法,并通过在迭代中解决并行化、更新方法、非光滑损失函数以及 Hessian 矩阵估计等问题,提高了算法性能。最终算法具有线性复杂度和无需超参数。
Abstract
Recent work has established an empirically successful framework for adapting
learning rates
for
stochastic gradient descent
(SGD). This effectively removes all needs for tuning, while automatically reducing
→