BriefGPT.xyz
Jul, 2019
预见优化器:向前 k 步,向后 1 步
Lookahead Optimizer: k steps forward, 1 step back
HTML
PDF
Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba
TL;DR
该论文提出了一种新的优化算法Lookahead,针对目前普遍使用的SGD和Adam优化算法进行了改进,能够提高学习的稳定性和性能表现。
Abstract
The vast majority of successful
deep neural networks
are trained using variants of
stochastic gradient descent
(SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1)
→