TL;DR调查了非凸损失函数下的经验风险最小化的方差缩减算法,尤其是 SVRG、SAGA 和 SARAH 的非凸版本,提出了基于重要性采样的小批量分析,并成功提高了训练速度。
Abstract
We provide the first importance sampling variants of variance reduced
algorithms for empirical risk minimization with non-convex loss functions. In
particular, we analyze non-convex versions of SVRG, SAGA and SAR