AbstractWe provide a theoretical explanation for the superb performance of ResNet via the study of deep linear networks and some nonlinear variants. We show that with or without nonlinearities, by adding shortcuts that have depth two, the condition number of the Hessian of the loss function at the zero initial point is depth-invariant, which makes
→