Batch Normalization (BN) makes output of hidden neuron had zero mean and unit variance, improving convergence and generalization when training neural networks. This work understands these phenomena theoretically. We analyze BN by using a building block of neural networks, which consists of a weight layer, a BN layer, and a nonlinear activation function. This simple network helps us understand the characteristics of BN, where the results are generalized to deep models in numerical studies. We explore BN in three aspects. First, by viewing BN as a stochastic process, an analytical form of regularization inherited in BN is derived. Second, the optimization dynamic with this regularization shows that BN enables training converged with large maximum and effective learning rates. Third, BN's generalization with regularization is explored by using random matrix theory and statistical mechanics. Both simulations and experiments support our analyses.

通过对神经网络的基本结构进行分析，我们发现批量标准化通过人口标准化和 gamma衰减作为显式正则化来实现隐式正则化，可以提高训练收敛性和泛化性，同时提供了学习动力学和正则化的学习方法，这一理论与实验证明了在卷积神经网络中批量标准化和上述分析具有相同的正则化特性。

批量归一化中正则化的理解