stochastic gradient descent (SGD) holds as a classical method to build large
scale machine learning models over big data. A stochastic gradient is typically
calculated from a limited number of samples (known as mini-batch), so it
potentially incurs a high variance and causes the estima