TL;DR我们比较了随机平均梯度 (SAG) 与一些经典机器学习优化算法,并提出了将 SAG 与动量算法和Adam相结合的方法,这些组合在优化函数时表现出更高的速度和更好的性能。
Abstract
Despite the recent growth of theoretical studies and empirical successes of neural networks, gradient backpropagation is still the most widely used algorithm for training such networks. On the one hand, we have d