BriefGPT.xyz
Nov, 2018
大规模深度神经网络海森矩阵的全频谱:SGD训练和样本规模的动态
The Full Spectrum of Deep Net Hessians At Scale: Dynamics with Sample Size
HTML
PDF
Vardan Papyan
TL;DR
使用最先进的高维数值线性代数工具来有效近似现代深度学习网络巨大参数空间上的Hessian谱,研究发现该Hessian具有“尖峰”行为,同时分别分析各项的训练动态和样本大小变化情况。
Abstract
Previous works observed the spectrum of the
hessian
of the training loss of deep neural networks. However, the networks considered were of minuscule size. We apply state-of-the-art tools in modern high-dimensional
numer
→