A continuing mystery in understanding the empirical success of deep neural
networks is their ability to achieve zero training error and generalize well,
even when the training data is noisy and there are more parameters than data
points. We investigate this overparameterized regime in