Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.

通过矩阵分解问题的数学建模，探究梯度优化算法所诱导的隐含正则化问题，研究发现规范（norms）不能完全解释矩阵分解问题中的正则化问题，通过实验证明排名（rank）是更有用的解释方式以及有可能解释深度学习中的泛化问题。

深度学习中的隐式正则化可能无法通过规范解释