Nowadays large-scale distributed machine learning systems have been deployed
to support various analytics and intelligence services in IT firms. To train a
large dataset and derive the prediction/inference model, e.g., a deep neural
network, multiple workers are run in parallel to trai