Adaptive stochastic gradient descent methods, such as AdaGrad, Adam, AdaDelta, Nadam, AMSGrad, \textit{etc.}, have been demonstrated efficacious in solving non-convex stochastic optimization, such as training deep neural networks. However, their convergence rates have not been touched under the non-convex stochastic circumstance except recent breakthrough re