In this paper, we focus on approaches to parallelizing stochastic gradient
descent (SGD) wherein data is farmed out to a set of workers, the results of
which, after a number of updates, are then combined at a central master node.
Although such synchronized SGD approaches parallelize well in idealized
computing environments, they often fail to realize their p