Matrix factorization is a simple and natural test-bed to investigate the implicit regularization of gradient descent. Gunasekar et al. (2018) conjectured that Gradient Flow with infinitesimal initialization converges to the solution that minimizes the nuclear norm, but a series of recent papers argued that the language of norm minimization is not sufficient to give a full characterization for the implicit regularization. In this work, we provide theoretical and empirical evidence that for depth-2 matrix factorization, gradient flow with infinitesimal initialization is mathematically equivalent to a simple heuristic rank minimization algorithm, Greedy Low-Rank Learning, under some reasonable assumptions. This generalizes the rank minimization view from previous works to a much broader setting and enables us to construct counter-examples to refute the conjecture from Gunasekar et al. (2018). We also extend the results to the case where depth $\ge 3$, and we show that the benefit of being deeper is that the above convergence has a much weaker dependence over initialization magnitude so that this rank minimization is more likely to take effect for initialization with practical scale.

通过深度为 2 的矩阵分解及理论和实证证据，我们证明了梯度流（用无穷小初始化）等价于一个简单的启发式秩量化算法，同时对深度大于等于 3 的情况进行了扩展，并证明了深度的优势在于对初始化幅度的弱依赖性，因此这种秩量化更可能在实践中起作用。

解决梯度下降隐式偏差的矩阵分解方法：贪婪的低秩学习