The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent has received a considerable amount of studies. Nevertheless, the community paid little attention to its decentralized variants. In this paper, we provide a novel formulation of the decentralized stochastic gradient descent. Leveraging this formulation together with (non)convex optimization theory, we establish the first stability and generalization guarantees for the decentralized stochastic gradient descent. Our theoretical results are built on top of a few common and mild assumptions and reveal that the decentralization deteriorates the stability of SGD for the first time. We verify our theoretical findings by using a variety of decentralized settings and benchmark machine learning models.

本论文提出了分散化随机梯度下降法的新方法，并使用（非）凸优化理论建立了第一个针对分散化随机梯度下降法的稳定性和泛化保证。我们的理论结果基于少数常见且温和的假设，并揭示分散化将首次降低SGD的稳定性。通过使用多种分散化设置和基准机器学习模型，证实了我们的理论发现。

分散随机梯度下降的稳定性和泛化能力