高阶L2正则化的线性DNN中隐式SGD偏差：由高到低秩的单向跳跃

May, 2023

高阶L2正则化的线性DNN中隐式SGD偏差：由高到低秩的单向跳跃

Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps from high to low rank

Zihan Wang, Arthur Jacot

TL;DR通过 SGD 算法，在一定概率下可以从高秩极小值跳到低秩极小值，但跳回去的概率为零，在矩阵补全任务中，目标是收敛到最小秩的局部极小值。

Abstract

The $L_{2}$-regularized loss of deep linear networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks. In tasks such as →