自适应预处理随机梯度 Langevin 动力学

Jun, 2019

自适应预处理随机梯度 Langevin 动力学

Adaptively Preconditioned Stochastic Gradient Langevin Dynamics

Chandrasekaran Anirudh Bhardwaj

TL;DR本研究利用自适应参数预处理噪声的方法，将Fisher Scoring等高阶曲率信息引入Stochastic Gradient Langevin Dynamics中，使其能够有效地跳出深度神经网络中曲率异常的波动区域，与Adam、AdaGrad等一阶自适应方法的收敛速度相当，并在测试集上实现了与SGD同等的泛化性能。

Abstract

stochastic gradient langevin dynamics infuses isotropic gradient noise to SGD to help navigate pathological curvature in the loss landscape for deep networks. Isotropic nature of the noise leads to poor scaling, and adaptive methods based on higher order curvature information such as F