BriefGPT.xyz
Jul, 2018
贝叶斯过滤统一自适应和非自适应神经网络优化方法
A unified theory of adaptive stochastic gradient descent as Bayesian filtering
HTML
PDF
Laurence Aitchison
TL;DR
通过贝叶斯滤波的方法,我们提出了一种新的神经网络优化器AdaBayes,能够自适应地在SGD和Adam之间切换,并且能够恢复出AdamW的效果,同时具有和SGD相当的泛化性能。
Abstract
There are a diverse array of schemes for adaptive stochastic gradient descent for optimizing neural networks, from fully factorised methods with and without momentum (e.g.\
rmsprop
and
adam
), to Kronecker factore
→