关于使用随机梯度下降训练的模型的泛化：信息论界限和含义

Oct, 2021

关于使用随机梯度下降训练的模型的泛化：信息论界限和含义

On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

Ziqiao Wang, Yongyi Mao

TL;DR本文基于Neu et al. (2021)的最新研究，在信息论方面提出了用于衡量机器学习模型的泛化误差的新上界。通过应用这些上界，分析了线性和ReLU网络的泛化行为，并得出了关于SGD训练的洞见以及一种新的简单的正则化方案。实验结果表明此正则方案的表现与当前最先进的方案相媲美。

Abstract

This paper follows up on a recent work of (Neu, 2021) and presents new and tighter information-theoretic upper bounds for the generalization error of machine learning models, such as →