Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models. Usually, uncorrelated noise is used in such perturbed gradient descent (PGD) methods. It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance. In this paper, we zoom in on the problem of correlating the perturbations of consecutive PGD steps. We consider a variety of objective functions for which we find that GD with anticorrelated perturbations ("Anti-PGD") generalizes significantly better than GD and standard (uncorrelated) PGD. To support these experimental findings, we also derive a theoretical analysis that demonstrates that Anti-PGD moves to wider minima, while GD and PGD remain stuck in suboptimal regions or even diverge. This new connection between anticorrelated noise and generalization opens the field to novel ways to exploit noise for training machine learning models.

本文探讨了在机器学习模型训练中注入人工噪声以提高性能的问题，并发现相比于无相关噪声和有相关噪声的方法，采用反相关噪声的梯度下降方法（Anti-PGD）能够更好地推广至新数据集上，这一发现为利用噪声进行机器学习模型训练提供了新的思路。

反相关噪声注入以提高泛化性能