BriefGPT.xyz
May, 2023
重温梯度剪裁:随机偏差和紧密收敛保证
Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees
HTML
PDF
Anastasia Koloskova, Hadrien Hendrikx, Sebastian U. Stich
TL;DR
本文研究了梯度裁剪在随机梯度下降中的应用,给出了裁剪阈值对收敛结果的影响和其上下界,进一步阐述了裁剪机制的缺陷及解决方案。
Abstract
gradient clipping
is a popular modification to standard (stochastic) gradient descent, at every iteration limiting the gradient norm to a certain value $c >0$. It is widely used for example for stabilizing the training of
→