BriefGPT.xyz
Aug, 2019
自适应学习率的方差及更多
On the Variance of the Adaptive Learning Rate and Beyond
HTML
PDF
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu...
TL;DR
文中探讨学习率预热方法在稳定训练、加速收敛和改善通用性方面的可靠性,发现自适应学习率在初始阶段有问题,建议使用预热作为方差缩减技术,并提出了一种新变量RAdam用于改善自适应学习率方差,实验结果表明其有效性和鲁棒性。
Abstract
The
learning rate warmup
heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for
adaptive stochastic optimization
algorithms like RMSprop and Adam.
→