In Federated Learning (FL), accessing private client data incurs communication and privacy costs. As a result, FL deployments commonly prefinetune pretrained foundation models on a (large, possibly public) dataset that is held by the central server; they then FL-finetune the model on a private, federated dataset held by clients. Evaluating prefinetuning dataset quality reliably and privately is therefore of high importance. To this end, we propose FreD (Federated Private Fr\'echet Distance) -- a privately computed distance between a prefinetuning dataset and federated datasets. Intuitively, it privately computes and compares a Fr\'echet distance between embeddings generated by a large language model on both the central (public) dataset and the federated private client data. To make this computation privacy-preserving, we use distributed, differentially-private mean and covariance estimators. We show empirically that FreD accurately predicts the best prefinetuning dataset at minimal privacy cost. Altogether, using FreD we demonstrate a proof-of-concept for a new approach in private FL training: (1) customize a prefinetuning dataset to better match user data (2) prefinetune (3) perform FL-finetuning.

介绍了一种用于联合学习的隐私保护方法—— FreD，该方法使用分布式的差分隐私均值和协方差估计器计算大型语言模型在中央（公共）数据集和联合私人客户数据上生成嵌入的 Fréchet 距离，以可靠和私密的方式评估 Prefinetuning 数据集的质量。使用 FreD，通过以下步骤，演示了一种新的私人 FL 训练方法的概念证明：（1）定制 Prefinetuning 数据集以更好地匹配用户数据（2）Prefinetuning（3）执行 FL-finetuning。

在联邦学习中通过私有化定制预调优以更好地匹配用户数据