归一化之力：更快速地逃离鞍点

Nov, 2016

The Power of Normalization: Faster Evasion of Saddle Points

Kfir Y. Levy

TL;DR通过选择合适的参数和注入噪音，我们分析了Normalized Gradient Descent（NGD）这个非凸优化启发式方法，表明此方法能够逃避鞍点，并且证明了NGD收敛到局部最小值，而且NGD的收敛速度比Ge等人 2015年提出的最快的一阶算法更快，我们将这个方法应用到在线张量分解问题上，并证明了在这个问题中，鞍点逃逸导致收敛到全局最小值。

Abstract

A commonly used heuristic in non-convex optimization is normalized gradient descent (NGD) - a variant of gradient descent in which only the direction of the gradient is taken into account and its magnitude ignore