Unregularized deep neural networks (DNNs) can be easily overfit with a
limited sample size. We argue that this is mostly due to the disriminative
nature of DNNs which directly model the conditional probability (or score) of
labels given the input. The ignorance of input distribution ma