Huge scale machine learning problems are nowadays tackled by distributed
optimization algorithms, i.e. algorithms that leverage the compute power of
many devices for training. The communication overhead is a key bottleneck that
hinders perfect scalability. Various recent works proposed