Histopathology can help clinicians make accurate diagnoses, determine disease prognosis, and plan appropriate treatment strategies. As deep learning techniques prove successful in the medical domain, the primary challenges become limited data availability and concerns about data sharing and privacy. Federated learning has addressed this challenge by training models locally and updating parameters on a server. However, issues, such as domain shift and bias, persist and impact overall performance. Dataset distillation presents an alternative approach to overcoming these challenges. It involves creating a small synthetic dataset that encapsulates essential information, which can be shared without constraints. At present, this paradigm is not practicable as current distillation approaches only generate non human readable representations and exhibit insufficient performance for downstream learning tasks. We train a latent diffusion model and construct a new distilled synthetic dataset with a small number of human readable synthetic images. Selection of maximally informative synthetic images is done via graph community analysis of the representation space. We compare downstream classification models trained on our synthetic distillation data to models trained on real data and reach performances suitable for practical application.

通过使用深度学习技术和数据集蒸馏方法，我们构建了一个小型合成数据集，其中包含最具信息量的人可读的合成图像，用于进行下游分类模型训练，并获得适用于实际应用的性能表现。

组织病理学中的安全数据共享图像提炼