In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.

通过研究线性可分数据分类中梯度算法的边界最大化偏差，提出一种名为渐进缩放梯度下降 (PRGD) 的新算法，在指数速率下最大化边界，相比于现有的多项式速率算法展现出明显区别，并验证了该理论发现在合成和实际数据上的有效性，同时在线性不可分数据集和深度神经网络上也显示了潜力提升泛化性能。

通过渐进范数缩放指数级快速实现边际最大化