Despite the huge success of deep learning, our understanding to how the non-convex neural networks are trained remains rather limited. Most of existing theoretical works only tackle neural networks with one hidden layer, and little is known for multi-layer neural networks.