We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on L-smoothness.

我们开发了一种梯度下降法的新次优性边界，该边界依赖于优化路径中的目标条件，而不是全局最坏情况下的常数。我们的证明关键在于方向平滑性，这是一种梯度变化的度量，我们用它来开发上界约束。通过求解隐式方程来最小化这些上界约束，我们展示了这些方程对于凸二次函数是容易解决的，并为两种传统步长提供了新的保证。对于一般函数，我们证明了Polyak步长和归一化梯度下降法尽管不使用方向平滑性的任何知识，但能够获得快速的路径相关性。逻辑回归上的实验证明，我们的收敛保证比基于L平滑性的传统理论更紧致。

方向平滑性和梯度方法：收敛性和适应性