early stopping is a widely used technique to prevent poor generalization
performance when training an over-expressive model by means of gradient-based
optimization. To find a good point to halt the optimizer, a common practice is
to split the dataset into a training and a smaller valid