BriefGPT.xyz
Feb, 2020
指数和余弦步长的再审视:简易性、适应性和性能
Exponential Step Sizes for Non-Convex Optimization
HTML
PDF
Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona
TL;DR
研究指出指数步长和余弦步长是自适应噪声水平的,不需要知道噪声水平和调整超参数就可以达到几乎最佳性能。探讨了这两种优化策略的收敛速度和表现,实验证明它们最多只需要调整两个超参数就可达到优秀的表现。
Abstract
stochastic gradient descent
(SGD) is a popular tool in large scale optimization of machine learning objective functions. However, the performance is greatly variable, depending on the choice of the
step sizes
. In
→