TL;DR该论文提出了一种基于行和列之和的移动平均数的方法,用于估计神经网络权重矩阵的参数,并解决了自适应方法在更新时产生的过大更新的问题。该方法能够在很少的辅助存储空间中达到与 Adam 默认规则相当的结果。
Abstract
In several recently proposed stochastic optimization methods (e.g. RMSProp,
Adam, Adadelta), parameter updates are scaled by the inverse square roots of
exponential moving averages of squared past gradients. Maintaining these
per-parameter second-moment estimators requires memory equal