Knowledge distillation has caught a lot of attention in federated learning (FL) recently. It has the advantage for FL to train on heterogeneous clients which have different data size and data structure. However, data samples across all devices are usually not independent and identicall