带预处理的Polyak步长的随机梯度下降

Oct, 2023

带预处理的Polyak步长的随机梯度下降

Stochastic Gradient Descent with Preconditioned Polyak Step-size

Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Martin Takáč

TL;DR扩展了Stochastic Gradient Descent with Polyak Step-size (SPS)方法，使用Hutchinson's方法、Adam和AdaGrad等预处理技术来提高其在糟糕缩放和/或病态数据集上的性能。

Abstract

stochastic gradient descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and