In federated distributed learning, the goal is to optimize a global training
objective defined over distributed devices, where the data shard at each device
is sampled from a possibly different distribution (a.k.a., heterogeneous or non
i.i.d. data samples). In this paper, we generaliz