BriefGPT.xyz
Jul, 2019
基于Hessian的SGD分析:深度网络的动力学和泛化
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
HTML
PDF
Xinyan Li, Qilong Gu, Yingxue Zhou, Tiancong Chen, Arindam Banerjee
TL;DR
本文通过对训练损失函数的海森矩阵及其相关量的分析,探讨了随机梯度下降(SGD)的优化动态和泛化行为等三个问题,并在合成数据、MNIST 和 CIFAR-10 数据集上进行了大量实验支持其理论结果。
Abstract
While
stochastic gradient descent
(SGD) and variants have been surprisingly successful for training
deep nets
, several aspects of the
optimizatio
→