利用动量加速随机梯度下降优化过参数化学习

Oct, 2018

利用动量加速随机梯度下降优化过参数化学习

MaSS: an Accelerated Stochastic Method for Over-parametrized Learning

Chaoyue Liu, Mikhail Belkin

TL;DR本文介绍了一种名为MaSS的算法，它使用与SGD相同的步长，但具有比SGD更快的加速收敛速度。该算法解决了Nesterov SGD的不收敛问题，并分析了收敛速度和最优超参数对于mini-batch size的依赖性。实验结果表明，MaSS算法在多个深度网络架构中均表现出比SGD、Nesterov SGD和Adam更优秀的性能。

Abstract

In this paper we introduce mass (Momentum-added Stochastic Solver), an accelerated SGD method for optimizing over-parameterized networks. Our method is simple and efficient to implement and does not require changing parameters or computing full gradients in the course of optimization.