Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Martin Takáč
TL;DR扩展了Stochastic Gradient Descent with Polyak Step-size (SPS)方法,使用Hutchinson's方法、Adam和AdaGrad等预处理技术来提高其在糟糕缩放和/或病态数据集上的性能。
Abstract
stochastic gradient descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and