BriefGPT.xyz
Nov, 2019
深度线性残差网络的梯度下降全局收敛性
Global Convergence of Gradient Descent for Deep Linear Residual Networks
HTML
PDF
Lei Wu, Qingcan Wang, Chao Ma
TL;DR
通过零对称(ZAS)初始化来避免鞍点的稳定流形,证明了在此初始化下,梯度下降可在O(L^3 log(1/ε))迭代内收敛到ε-optimal点,特别是当深度L很大时,表明残差结构和初始化对于深度线性神经网络的优化非常重要。
Abstract
We analyze the global convergence of
gradient descent
for
deep linear residual networks
by proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by avoiding stable manifolds of sad
→