We analyze the sample complexity of full-batch Gradient Descent (GD) in the setup of non-smooth Stochastic Convex Optimization. We show that the generalization error of GD, with (minmax) optimal choice of hyper-parameters, can be $\tilde \Theta(d/m + 1/\sqrt{m})$, where $d$ is the dimension and $m$ is the sample size. This matches the sample complexity of \emph{worst-case} empirical risk minimizers. That means that, in contrast with other algorithms, GD has no advantage over naive ERMs. Our bound follows from a new generalization bound that depends on both the dimension as well as the learning rate and number of iterations. Our bound also shows that, for general hyper-parameters, when the dimension is strictly larger than number of samples, $T=\Omega(1/\epsilon^4)$ iterations are necessary to avoid overfitting. This resolves an open problem by \citet*{schliserman2024dimension, amir2021sgd}, and improves over previous lower bounds that demonstrated that the sample size must be at least square root of the dimension.

我们分析了非光滑随机凸优化中全批量梯度下降（GD）的样本复杂性，表明GD的泛化误差与最优超参数选择的样本复杂性匹配，可以表示为Θ(d/m + 1/√m)，其中d为维度，m为样本大小，这与最坏情况下的经验风险最小化器相符，这意味着与其他算法相比，GD没有优势。我们的界限来自一个新的泛化界限，它取决于维度、学习速率和迭代次数。对于一般的超参数，当维度严格大于样本数量时，需要Ω(1/ε^4)次迭代才能避免过拟合，这解决了schliserman2024dimension和amir2021sgd提出的一个开放问题，并改进了先前的下界，前者证明了样本大小至少必须为维度的平方根。

随机凸优化中梯度下降的样本复杂度