A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution -- which we call the ravine -- so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory and algorithm on the problems of matrix sensing and factorization and learning a single neuron in the overparameterized regime.

本研究解决了梯度下降的线性收敛依赖于函数在最小值附近 quadratic 增长的普遍看法，证明了在四阶增长条件下梯度下降仍然可以实现局部（几乎）线性收敛。提出的自适应步长基于一种有趣的分解定理，对于这类平滑函数，最优解周围形成的“山谷”结构保障了快速收敛。此工作对矩阵感知、分解及超参数化单神经元学习等问题提供了理论与算法的应用示例。

自适应步长的梯度下降在四阶增长下几乎线性收敛