Modern deep neural networks are highly over-parameterized compared to the
data on which they are trained, yet they often generalize remarkably well. A
flurry of recent work has asked: why do deep networks not overfit to their
training data? In this work, we make a series of empirical o