The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms. This notion refers to the tendency of the optimization algorithm towards a certain structured solution that often generalizes well. Recently, several papers have studied implicit regularization and were able to identify this phenomenon in various scenarios. We revisit this paradigm in arguably the simplest non-trivial setup, and study the implicit bias of Stochastic Gradient Descent (SGD) in the context of Stochastic Convex Optimization. As a first step, we provide a simple construction that rules out the existence of a \emph{distribution-independent} implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of \emph{distribution-dependent} implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations. Certain aspects of our constructions point out to significant difficulties in providing a comprehensive explanation of an algorithm's generalization performance by solely arguing about its implicit regularization properties.

本文研究隐式偏差和隐式正则化对随机凸优化中随机梯度下降的影响，提供了一种简单构造来排除控制SGD泛化能力的分布独立的隐式正则化器的存在，并且证明了分布依赖的一般类隐式正则化器不能解释泛化的学习问题，说明了仅仅通过隐式正则化的特性来全面解释算法的泛化性能存在重大困难。

隐性偏见能否解释泛化问题？随机凸优化作为案例研究