In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon that we call implicit sparse regularization. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stopping and show that for a general depth parameter N, gradient descent with early stopping achieves minimax optimal sparse recovery with sufficiently small initialization and step size. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stopping window, which leads to more stable gradient paths for sparse recovery.

本文研究了梯度下降的隐式偏差对于稀疏回归的影响，并将关于二次参数化的回归结果扩展到更一般的深度为N的网络，结果表明通过提前停止来实现隐式稀疏规则化至关重要，并且对于一般深度参数N，足够小的初始化和步长可以实现最小化最优的稀疏恢复。

隐式稀疏正则化：深度和提前停止的影响