Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial in many applications, but the most popular algorithm, Stochastic Gradient Descent (SGD), is a serial algorithm that is surprisingly hard to parallelize. In this paper, we propose an efficient distributed stochastic optimization method based on adaptive step sizes and variance reduction techniques. We achieve a linear speedup in the number of machines, small memory footprint, and only a small number of synchronization rounds -- logarithmic in dataset size -- in which the computation nodes communicate with each other. Critically, our approach is a general reduction than parallelizes any serial SGD algorithm, allowing us to leverage the significant progress that has been made in designing adaptive SGD algorithms. We conclude by implementing our algorithm in the Spark distributed framework and exhibit dramatic performance gains on large-scale logistic regression problems.

本文提出了一种高效的分布式随机优化方法，通过结合适应性与方差约减技术，从而实现任何串行在线学习算法的并行计算，能够在不需要光滑参数的先验知识的情况下实现最优收敛速率，同时通过Spark分布式框架的实现能够对大规模逻辑回归问题进行高效处理。

自适应SGD分布式随机优化