BriefGPT.xyz
Jun, 2024
SGD在高维度中的梯度剪裁动力学
A Clipped Trip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
HTML
PDF
Noah Marshall, Ke Liang Xiao, Atish Agarwala, Elliot Paquette
TL;DR
通过研究剪裁在流式随机梯度下降中的应用,发现剪裁在某些噪声环境中可以提供性能优势,并讨论了高维剪裁与神经网络训练之间的联系。
Abstract
The success of modern
machine learning
is due in part to the
adaptive optimization methods
that have been developed to deal with the difficulties of training large models over complex datasets. One such method is
→