A popular method to reduce the training time of deep neural networks is to
normalize activations at each layer. Although various normalization schemes
have been proposed, they all follow a common theme: normalize across spatial
dimensions and discard the extracted statistics. In this p