The trade-off between convergence error and communication delays in decentralized stochastic gradient descent~(SGD) is dictated by the sparsity of the inter-worker communication graph. In this paper, we propose MATCHA, a decentralized SGD method where we use matching decomposition sampling of the base graph to parallelize inter-worker information exchange so as to significantly reduce communication delay. At the same time, under standard assumptions for any general topology, in spite of the significant reduction of the communication delay, MATCHA maintains the same convergence rate as that of the state-of-the-art in terms of epochs. Experiments on a suite of datasets and deep neural networks validate the theoretical analysis and demonstrate the effectiveness of the proposed scheme as far as reducing communication delays is concerned.

本文研究了分布式训练中通常遇到的误差-运行时权衡问题，提出了MATCHA算法，该算法能够在任意网络拓扑结构下实现误差-运行时权衡的双赢，并且通过将拓扑结构分解为匹配来实现节点之间的并行交流。通过实验验证，MATCHA算法在达到相同的训练损失时比基本的分布式随机梯度下降算法少花费高达5倍的时间。

MATCHA: 基于匹配分解采样的分布式随机梯度下降加速算法