We propose a practical method for $L_0$ norm regularization for neural
networks: pruning the network during training by encouraging weights to become
exactly zero. Such regularization is interesting since (1) it can greatly speed
up training and inference, and (2) it can improve genera