面向极小化问题：超参数问题SGD的快速收敛

Jun, 2023

面向极小化问题：超参数问题SGD的快速收敛

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma

TL;DR本文提出在插值范式内的正则条件，使得随机梯度方法与确定性梯度方法具有相同的最坏迭代复杂度，同时仅在每次迭代中使用单个采样梯度（或一个小批量）。最后，我们证明了我们的条件在训练具有线性输出层的足够宽的前馈神经网络时成立。

Abstract

Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we pro