While the depth of modern convolutional neural networks (CNNs) surpasses that
of the pioneering networks with a significant margin, the traditional way of
appending supervision only over the final classifier and progressively
propagating gradient flow upstream remains the training main