We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent. However, although steepest descent methods offer an intuitive approach to finding minima, many deep learning applications require adaptive gradient methods to achieve faster convergence. In this paper, we interpret adaptive gradient methods as steepest descent applied on parameter-scaled networks, proposing learning-rate-free adaptive gradient methods. Experimental results verify the effectiveness of this approach, demonstrating comparable performance to hand-tuned learning rates across various scenarios. This work extends the applicability of learning-rate-free methods, enhancing training with adaptive gradient methods.

我们提出解决在训练深度神经网络中用于自适应梯度方法的学习率估计的挑战。我们将自适应梯度方法解释为应用于参数缩放网络的最陡下降方法，并提出了无学习率的自适应梯度方法。实验结果验证了该方法的有效性，并证明在各种情况下，其性能与手动调优的学习率相当。这项工作扩展了无学习率方法的适用性，增强了自适应梯度方法的训练效果。

通过参数缩放解释自适应梯度方法对无学习率优化的解读