gradient descent (GD) methods are commonly employed in machine learning
problems to optimize the parameters of the model in an iterative fashion. For
problems with massive datasets, computations are distributed to many parallel
computing servers (i.e., workers) to speed up GD iteration