Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle this problem. However, the performance of distrib