We propose the generalized Newton's method (GeN) -- a Hessian-informed approach that applies to any optimizer such as SGD and Adam, and covers the Newton-Raphson method as a sub-case. Our method automatically and dynamically selects the learning rate that accelerates the convergence, without the intensive tuning of the learning rate scheduler. In practice, out method is easily implementable, since it only requires additional forward passes with almost zero computational overhead (in terms of training time and memory cost), if the overhead is amortized over many iterations. We present extensive experiments on language and vision tasks (e.g. GPT and ResNet) to showcase that GeN optimizers match the state-of-the-art performance, which was achieved with carefully tuned learning rate schedulers. Code to be released at \url{https://github.com/ShiyunXu/AutoGeN}.

我们提出了广义牛顿法（GeN）——一种基于海森矩阵的方法，适用于任何优化器（如SGD和Adam），并将牛顿-拉弗森法作为一个子案例。我们的方法自动动态地选择加速收敛的学习率，无需进行繁琐的学习率调度。在实践中，我们的方法易于实施，因为它只需要进行附加的前向传递，几乎不会带来计算开销（以训练时间和内存成本计），如果将开销分摊到许多迭代中。我们展示了在语言和视觉任务上的大量实验证明GeN优化器可以与现有技术的性能相匹配，而这些现有技术是通过仔细调整学习率调度器来实现的。代码将在https://github.com/ShiyunXu/AutoGeN发布。

自动梯度下降与广义牛顿法