BriefGPT.xyz
Feb, 2019
学习超参数化深度ReLU网络的梯度下降泛化误差界
A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks
HTML
PDF
Yuan Cao, Quanquan Gu
TL;DR
通过算法依赖的综合误差界推导,论文解释了过度参数化的深度神经网络在合适的随机初始化下,使用梯度下降法可以获得任意小的泛化误差。
Abstract
Empirical studies show that gradient based methods can learn
deep neural networks
(DNNs) with very good
generalization performance
in the
over-pa
→