In this paper, we present a distributed variant of adaptive stochastic
gradient method for training deep neural networks in the parameter-server
model. To reduce the communication cost among the workers and server, we
incorporate two types of quantization schemes, i.e., gradient quanti