AbstractRecent research has suggested that when training neural networks, flat local minima of the empirical risk may cause the model to generalize better. Motivated by this understanding, we propose a new
regularization scheme. In this scheme, referred to as
→