Existing learning rate schedules that do not require specification of the
optimization stopping step T are greatly out-performed by learning rate
schedules that depend on T. We propose an approach that avoids the need for
this stopping time by eschewing the use of schedules entirely, while exhibiting
state-of-the-art performance compared to schedules across a wide family of
problems ranging from convex problems to large-scale deep learning problems.
Our Schedule-Free approach introduces no additional hyper-parameters over
standard optimizers with momentum. Our method is a direct consequence of a new
theory we develop that unifies scheduling and iterate averaging. An open source
implementation of our method is available
(this https URL).

不依赖于优化停止步骤 T 的现有学习率调度比依赖 T 的学习率调度性能更好。我们提出了一种完全避免使用调度的方法，同时在从凸问题到大规模深度学习问题的广泛问题范围内展示了与调度相比的最先进性能。我们的无调度方法与带有动量的标准优化器没有额外的超参数。我们的方法是我们开发的一种新理论的直接结果，该理论统一了调度和迭代平均化。我们的方法的开源实现可在此 URL 找到。