We consider gradient descent (GD) with a constant stepsize applied to
logistic regression with linearly separable data, where the constant stepsize
$\eta$ is so large that the loss initially oscillates. We show that GD exits
this initial oscillatory phase rapidly -- in $\mathcal{O}(\eta)$ steps -- and
subsequently achieves an $\tilde{\mathcal{O}}(1 / (\eta t) )$ convergence rate
after $t$ additional steps. Our results imply that, given a budget of $T$
steps, GD can achieve an accelerated loss of $\tilde{\mathcal{O}}(1/T^2)$ with
an aggressive stepsize $\eta:= \Theta( T)$, without any use of momentum or
variable stepsize schedulers. Our proof technique is versatile and also handles
general classification loss functions (where exponential tails are needed for
the $\tilde{\mathcal{O}}(1/T^2)$ acceleration), nonlinear predictors in the
neural tangent kernel regime, and online stochastic gradient descent (SGD) with
a large stepsize, under suitable separability conditions.

使用常数步长的梯度下降算法应用于线性可分数据的逻辑回归，证明了在初始震荡阶段后，算法能够在 a 步的时间内实现 O (1/(aT)) 的收敛速率，从而在总步数为 T 的情况下，通过积极地调整步长可以达到 O (1/T^2) 的加速损失，无需使用动量或变化的步长调度器。